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Abstract 


Garbage  collector  performance  in  LISP  systems  on  custom  hardware  has 
been  substantially  improved  by  the  adoption  of  lifetime-based  garbage  col¬ 
lection  techniques.  To  date,  however,  successful  lifetime-based  garbage  col¬ 
lectors  have  required  special-purpose  hardware,  or  at  least  privileged  access 
to  data  structures  maintained  by  the  virtual  memory  system.  I  present  here 
a  lifetime-based  garbage  collector  requiring  no  special-purpose  hardware  or 
virtual  memory  system  support,  and  discuss  its  performance. 
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Introduction 


The  pointer-oriented  semantics  of  LISP,  and  the  nature  of  heap  allocation, 
often  result  in  poor  virtual  memory  performance  on  general-purpose  com¬ 
puters.  Garbage  collection,  in  particular,  has  traditionally  required  exami¬ 
nation  of  at  least  all  live  data  created  by  user  programs  (in  copying  garbage 
collectors)  and  sometimes  of  storage  recovered  as  well  (in  mark-sweep  gar¬ 
bage  collectors). 

Lieberman  and  Hewitt  introduced  in  1981  a  garbage  collection  algorithm 
based  on  the  lifetimes  of  objects  [9].  By  grouping  objects  according  to 
their  ages,  the  proposed  garbage  collector  avoided  examination  of  relatively 
old  objects  when  garbage  collecting  relatively  young  objects.  LISP  imple¬ 
mentations  on  machines  with  special-purpose  hardware  or  instruction  sets 
microcoded  to  support  this  algorithm  ha.ve  realized  significant  performance 
improvements,  as  noted  by  Moon  [10]. 

Moon’s  contention  was  that  the  overhead  of  bookkeeping  required  to  keep 
track  of  references  to  newly-created  objects  in  lifetime-based  garbage  collec¬ 
tors  on  general-purpose  computers  would  result  in  prohibitive  performance 
degradation.  Shaw  [12]  has  recently  described  a  scheme  in  which  the  vir¬ 
tual  memory  hardware  present  on  modern  general-purpose  computers  can 
be  used  to  keep  track  of  newly-stored  pointers,  provided  one  has  access  to 
the  data  structures  used  by  the  virtual  memory  system. 

I  describe  here  the  portable  lifetime-based  garbage  collector  used  in  Lucid 
Common  LISP  on  the  Apollo  and  Sun  workstations.  As  Lucid  Common 
LISP  is  portable,  this  garbage  collector  does  not  have  the  cooperation  of  the 
virtual  memory  systems  on  the  computers  it  ruirs  on.  Nor  does  it  make  use 
of  special-purpose  hardware;  still,  the  techniques  of  lifetime-based  garbage 
collection  are  sufficiently  powerful  that  overall  system  performance  is  often 
enhanced,  and  delays  for  garbage  collection  are  unnoticeable,  resulting  in 
better  interactive  behavior. 
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A  Note  on  Terminology 

The  popularity  of  lifetime- based  garbage  collectors  since  the  publication  of 
Lieberman  and  Hewitt’s  paper  has  led  to  a  number  of  implementations, 
many  of  whose  designers  have  invented  their  own  terms  to  describe  their 
work.  In  what  follows,  I  usually  use  Moon’s  terminology  where  it  is  appli¬ 
cable,  as  this  work  was  influenced  most  directly  by  his.  In  discussing  the 
techniques  of  copying  garbage  collectors,  I  use  the  terminology  of  Fenichel 
and  Yochelson  [6]:  memory  in  a  copying  garbage  collector  is  divided  into 
semispaces,  only  one  of  which  is  in  use  at  any  time  (the  current  semispace), 
except  during  garbage  collection,  when  both  are  used. 

The  names  of  LISP  data  types  are  used  in  discussions  where  they  pertain; 
in  particular,  list  cells  are  referred  to  as  cons  cells. 

An  understanding  of  traditional  garbage  collection  techniques  is  assumed;  in 
particular,  the  reader  is  assumed  to  understand  the  behavior  of  mark-sweep 
garbage  collectors,  and  of  Cheney’s  copying,  compacting  garbage  collection 
algorithm  [5].  This  last  will  be  referred  to  as  copying  garbage  collection, 
or  sometimes  as  stop-and-copy  garbage  collection.  The  term  root  set  will 
be  used  to  refer  to  the  set  of  objects  explicitly  specified  to  the  garbage 
collector  for  preservation;  all  objects  preserved  through  a  garbage  collection 
are  either  in  the  root  set  or  are  encountered  in  some  directed  walk  beginning 
at  an  object  in  the  root  set. 

During  a  copying  garbage  collection,  the  space  copied  from  is  called  from- 
space;  the  space  copied  to  is  called  tospace,  or  copyspace.  Scavenging  is  the 
operation  that  copies  objects  referenced  by  a  set  of  roots  from  fromspace  to 
tospace,  and  updates  in  the  root  set  the  references  to  the  copied  objects.  It 
may  be  used  as  a  transitive  verb;  thus  ‘scavenging  the  stack’  would  mean 
finding  the  objects  in  fromspace  pointed  to  by  pointers  in  the  stack,  copying 
them  and  their  descendants  to  tospace,  and  updating  the  pointers  in  the 
stack  to  point  to  the  newly  relocated  objects. 

Fromspace  and  tospace  together  are  called  dynamic  space,  as  the  objects 
in  them  are  moved  dynamically.  In  systems  with  lifetime- based  garbage 
collectors,  if  there  is  a  space  where  long-lived  objects  are  maintained  and  are 
garbage-collected  with  a  copying  garbage  collector  that  is  not  the  lifetime- 
based  garbage  collector,  this  space  is  also  called  dynamic  space,  and  the 
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garbage  collector  is  referred  to  as  the  dynamic  garbage  collector. 

This  paper  assumes  some  knowledge  of  the  behavior  of  virtual  memory  sys¬ 
tems;  in  particular,  an  understanding  of  terms  like  page,  main  memory,  and 
backing  store.  The  term  dirty  bit  is  sometimes  used;  this  refers  to  the  infor¬ 
mation  maintained  on  a  per-page  basis  by  virtual  memory  systems,  stating 
whether  the  page  in  question  has  been  modified  (made  “dirty”)  while  it  has 
been  in  main  memory.  The  sections  of  main  memory  that  hold  individual 
virtual  memory  pages  are  called  page  frames. 

In  discussions  where  it  is  advantageous  to  consider  particular  architectures, 
I  use  as  an  example  the  Motorola  MC68020,  using  the  assembler  syntax  in 
[11].  My  terminology  differs  from  theirs  only  in  that,  when  I  refer  to  a  word, 
I  mean  a  32-bit  quantity;  Motorola  refer  to  these  as  longwords. 

Where  specific  LISP  tagging  schemes  are  considered,  I  use  that  employed 
by  Lucid  Common  LISP  on  the  MC68020. 
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Part  I 

Motivation  and  Prior  Work 


1  Garbage  Collection  in  IVtodern  LISP  Systems 

1.1  Copying  Garbage  Collection 

Fenichel  and  Yochelson  [6]  have  described  how  performance  will  degrade 
over  time  in  LISP  systems  utilizing  virtual  memory.  Their  solution,  copying 
garbage  collection,  as  further  modified  by  Cheney  [5],  was  widely  adopted  in 
modern  LISP  systems;  but  its  performance  was  limited  by  the  need  to  scan 
a  potentially  large  root  set,  and  to  move  from  one  area  to  the  other,  on  each 
garbage  collection,  all  the  structures  maintained  through  a  computation.  In 
a  large  LISP  system  running  on  a  machine  with  virtual  memory,  garbage 
collections  could  result  in  quite  lengthy  pauses;  enough  so  that  White  pre¬ 
scribes  a  scheme  in  which  garbage  collection  is  avoided  altogether  in  virtual 
memory  systems  [14]. 

Certainly  refinements  are  possible.  One  popular  refinement  used  in  many 
LISP  systems  [3]  is  to  create  in  a  “static”  space  those  objects  one  knows  wiU 
be  relatively  permanent,  and  to  scan  these  along  with  the  root  set,  then, 
while  pointers  in  static  objects  to  objects  in  dynamic  space  are  stiU  updated 
during  a  garbage  collection,  static  objects  (as  their  name  implies)  are  not 
relocated,  and  the  work  of  copying  them  is  saved. 

Another  refinement  uses  an  “unscanned”  space, ^  in  which  permanent  im¬ 
mutable  objects  that  contain  only  pointers  to  the  static  or  unscanned  spaces 
are  stored;  because  static  objects  are  not  copied  by  the  garbage  collector, 
the  pointers  to  them  need  not  change,  and  so  unscanned  space  need  not  be 
scanned  by  the  garbage  collector;  and,  of  course,  the  objects  contained  in  it 
will  not  be  relocated,  so  that,  again,  the  work  of  copying  them  is  saved. 

1  Brooks  et  al.  [3]  refer  to  this  as  a  “read-only”  space;  however,  to  avoid  confusion  with, 
for  example,  pure  shared  pages,  and  concentrate  exclusively  on  the  garbage  coUection  issue, 
I  refer  to  it  as  unscanned. 
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There  is  something  unsatisfactory  about  this  sort  of  refinement,  however. 
The  garbage  collector  does  not  needlessly  transport  objects  placed  in  static 
or  unscanned  spaces,  or  even  examine  those  in  unscanned  space.  But  the 
storage  they  occupy  can  never  be  recovered,  either,  as  these  spaces  are  not, 
by  their  nature,  garbage- collected.  Thus,  in  order  to  decide  whether  to 
place  an  object  in  a  static  space,  that  is,  a  space  whose  contents  are  never 
relocated,  the  programmer  must  think  about  its  lifetime;  and  this  begins  to 
smack  of  the  sort  of  storage  management  details  from  which  LISP  purports 
to  free  programmers.  Similarly,  when  deciding  whether  to  place  an  object 
in  a  unscanned  space,  whose  contents  will  not  be  scanned  by  the  garbage 
collector,  the  programmer  must  decide  whether  it  contains  pointers  to  dy¬ 
namic  objects,  and  this,  again,  is  the  sort  of  task  from  which  we  would  like 
to  free  the  programmer. 

Although  we  are  unsatisfied  with  the  additional  tasks  these  refinements  force 
on  the  programmer,  we  can  see  that  they  are  motivated  by  a  desire  to 
perform  two  valuable  optimizations:  reduction  of  the  size  of  the  root  set, 
and  reduction  of  the  number  of  times  that  relatively  permanent  objects 
are  copied.  We  may  understand  lifetime-based  garbage  collection  as  an 
automation  of  these  optimizations. 


1.2  Lifetime-Based  Garbage  Collection 

1.2.1  Lieberman  and  Hewitt’s  Garbage  Collector 

Baker  [1]  introduced  in  1978  a  copying  garbage  collection  algorithm  that 
operated  in  an  incremental  fashion:  the  work  of  transporting  objects  from 
one  semispace  to  the  other  was  interleaved  with  the  normal  object  creation 
and  manipulation  functions  of  the  LISP  system.^ 

In  practice,  as  implemented  on  the  Symbolics  3600,  Baker’s  garbage  collector 

^Incremental  garbage  collection  has  the  advantage  that  there  are  no  long  pauses  for 
garbage  collection;  however,  I  choose  not  to  discuss  it  in  any  detail  here,  as  its  practical 
implementation  requites  the  use  of  an  architectural  feature  called  the  invisible  pointo' [8], 
which  is  not  usually  present  on  general-purpose  machines.  As  will  be  obvious,  however, 
Lieberman  and  Hewitt’s  methods  have  immediate  application  to  stop-and-copy  garbage 
collection,  despite  the  fact  that,  as  originally  presented,  they  make  yet  another  use  of 
invisible  pointers.  This  use,  however,  may  be  circumvented  on  general-purpose  computers. 
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suffered  from  poor  virtual  memory  performance  (see  note  A.l).  Lieberman 
and  Hewitt  [9]  described  a  modification  to  Baker’s  algorithm,  in  which, 
rather  than  being  divided  into  semispaces,  memory  was  divided  into  many 
small  sections  called  regions. 

At  any  time  there  is  a  current  creation  region,  in  which  new  objects  are 
allocated.  When  the  current  creation  region  is  filled,  a  new,  empty  region 
is  allocated  to  be  the  current  creation  region,  and  objects  are  created  there, 
rather  than  in  the  old  region. 

Each  region  has  a  number,  called  a  generation  number.  Generation  num¬ 
bers  increase  monotonically  with  time.  Occasionally  the  garbage  collector 
will  be  run  on  the  contents  of  a  region.  When  this  happens,  a  new  region  is 
allocated;  the  live  contents  of  the  old  region  are  copied  into  it,  and  the  old 
region’s  storage  is  recycled.  The  old  region’s  generation  number  is  retained 
for  the  new  region.  Thus  this  scheme  distinguishes  between  the  number  of 
garbage  collections  that  an  object  may  survive,  and  the  object’s  chronolog¬ 
ical  age;  this  is  motivated  because  the  garbage  collector  is  run  on  regions 
individually,  and  thus  there  is  not  necessarily  a  relation  between  the  two 
numbers. 

In  a  traditional  copying  garbage  collection  scheme,  the  garbage  collection 
of  any  of  these  regions  would  require  scanning  all  others,  both  to  discover 
which  objects  in  the  region  were  being  referenced  by  objects  outside  the 
region,  and  thus  needed  to  be  preserved,  and  also  to  update  the  pointers 
from  other  regions  to  objects  within  the  garbage-collected  region,  as  these 
objects  will  be  relocated  during  the  garbage  collection. 

To  avoid  the  necessity  of  scanning  all  other  regions,  Lieberman  and  Hewitt  s 
garbage  collector  maintains  for  each  region  an  entry  table.  The  entry  table 
is  a  table  of  pointers  to  objects  in  the  region.  Pointers  from  outside  the 
region  to  objects  inside  it  are  made  to  point  to  entries  in  the  entry  table. 
Along  with  the  stack  and  registers,  the  table  is  used  as  the  root  set  when 
the  region  is  garbage-collected;  thus  other  regions  need  not  be  scanned. 

As  proposed  for  implementation  on  the  MIT  LISP  Machine  [8],  the  entry  ta¬ 
ble  was  implemented  using  invisible  pointers ,  an  architectural  feature  which 
causes  references  to  a  location  in  memory  to  be  transparently  forwarded  to 
another  location.  Thus  no  software  overhead  was  incurred  to  check  whether 
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a  pointer  to  an  object  actually  pointed  to  an  entry  in  an  entry  table.  In  prac¬ 
tice,  however,  the  lifetime-based  garbage  collectors  discussed  in  the  present 
text  do  not  maintain  entry  tables;  rather,  they  use  structures  that  instead 
somehow  record  the  locations  of  pointers  outside  a  region  to  objects  inside 
the  region. 

Lieberman  and  Hewitt’s  scheme  called  for  making  entry  table  entries  only 
for  objects  pointed  to  from  regions  older  than  the  region  being  garbage- 
collected;  these  were  referred  to  as  “pointers  forwards  in  time.”  Objects 
pointed  to  only  from  regions  younger  than  the  region  being  garbage- collected, 
called  “pointers  backwards  in  time,”  were  not  entered  in  the  entry  table; 
and  such  regions  were  to  be  scanned  as  part  of  the  root  set  during  garbage 
collection.  This  optimization  was  motivated  by  the  consideration  that  the 
majority  of  pointers  would  be  pointers  backwards  in  time,  and  that  the  re¬ 
gions  most  often  garbage- collected  would  be  the  youngest;  and  thus  there 
would  be  few  regions  younger  than  them  to  be  scanned. 

But  what  is  it  that  makes  this  garbage  collector  lifetime-based?  Lieberman 
and  Hewitt  observed  that  the  mortality  rate  of  older  objects  is  much  lower 
than  that  of  younger  objects.  Thus  garbage  collections  spaced  at  set  intervals 
will  likely  reclaim  less  space  from  regions  containing  older  objects  than  from 
regions  containing  younger  objects,  so  that  more  storage  will  be  reclaimed 
per  unit  of  garbage  collector  work  by  garbage-collecting  younger  regions 
more  often  than  older  regions. 

It  should  be  noted,  then,,  that  hfetime-based  garbage  collection  affords  two 
separate  optimizations.  By  dividing  memory  into  regions  with  recording 
structures  specifying  the  objects  that  point  to  them,  they  limit  the  size  of 
the  root  set.  By  allowing  objects  to  be  segregated  according  to  age,  and 
garbage-collecting  areas  those  containing  relatively  permanent  objects  less 
often,  they  limit  the  amount  of  copying  that  must  be  done  in  a  garbage 
collection. 


1.2.2  Address  Space  Utilization  and  Physical  Memory  Utiliza¬ 
tion  in  Lifetime-based  Garbage  Collection 


Lieberman  and  Hewitt’s  scheme  allows  for  utilization  during  user  compu¬ 
tations  of  a  higher  percentage  of  the  available  address  space  than  does 
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simple  stop-and-copy  garbage  collection.  This  is  because  not  all  regions 
are  garbage- collected  simultaneously,  and  only  as  much  storage  as  will  be 
needed  to  hold  the  data  in  the  region  or  regions  actually  currently  being 
garbage-collected  must  be  maintained  free.  In  simple  stop-and-copy  gar¬ 
bage  collection  (or  in  Baker’s  incremental  garbage  collection),  however,  a 
semispace’s  worth  of  free  storage  must  be  maintained  at  all  times;  otherwise 
a  flip  may  fail. 

The  traditional  wisdom  about  copying  garbage  collectors,  as  first  advanced 
by  Fenichel  and  Yochelson  in  [6],  is  that  address  space  utilization  is  of 
no  great  importance;  garbage  collections  are  performed  to  improve  local¬ 
ity  of  reference,  and  address  space  recycling  is  only  a  secondary  concern.  In 
lifetime-based  garbage  collectors,  however,  the  frequency  of  garbage  collec¬ 
tions  of  younger  levels  is  such  that  both  the  space  being  copied  from  and 
the  space  being  copied  to  must  be  considered  part  of  the  working  set.  We 
expect,  then,  that  the  best  results  will  be  attained  by  memory  layouts  that 
make  the  most  efficient  possible  use  of  the  address  space  occupied  by  the 
most  frequently  garbage-collected  levels. 

Lieberman  and  Hewitt’s  garbage  collector  does  allow  for  utilization  of  a 
greater  portion  of  the  address  space  during  user  computations  than  would  a 
scheme  that  statically  maintained  semispaces  for  each  generation,  because 
the  space  occupied  by  a  region  just  copied  from  may  be  re-utilized  in  the 
copying  of  another  region.  However,  because  the  space  occupied  by  the 
youngest  objects  changes  with  each  garbage  collection  of  the  youngest  gen¬ 
eration,  the  virtual  memory  performance  of  this  system  will  not  be  all  we 
might  hope  for. 
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2  Previous  Implementations 


Lieberman  and  Hewitt’s  work  has  provided  a  basis  for  several  lifetime-based 
garbage  collectors.  I  discuss  hei'e  three  garbage  collectors  examined  during 
the  design  of  Lucid’s  lifetime-based  garbage  collector;  these  are:  Ungar’s 
generation-scavenging  garbage  collector  [13],  the  Tektronix  Large  Object 
Space  Smalltalk  garbage  collector  [4],  and  Moon’s  ephemeral  garbage  col¬ 
lector  for  the  Symbolics  3600  [10].  The  points  considered  in  each  case  are; 
division  of  memory,  determination  of  the  root  set,  and  advancement  policy. 


2.1  Ungar’s  Generation-Scavenging  Garbage  Collector 

2.1.1  Description 

Ungar’s  Berkeley  Smalltalk  system  divides  objects  into  two  classes:  new  and 
old.  The  space  that  old  objects  live  in  is  called  OldSpace.  These  old  objects 
are  garbage- collected  offline;  which  is  to  say,  not  at  all  during  a  Smalltalk 
session.  There  are  three  spaces  for  new  objects:  NewSpace,  where  new 
objects  are  created,  PastSurvivorSpace,  where  new'  objects  are  also  stored, 
but  are  never  created,  and  FutureSurvivorSpace,  which  is  the  space  into 
which  PastSurvivorSpace  and  NewSpace  are  copied. 

New  objects  are  always  created  in  NewSpace.  Every  time  a  pointer  is  set, 
if  it  is  a  pointer  from  Oldspace  to  NewSpace  or  PastSurvivorSpace,  the  lo¬ 
cation  in  which  the  pointer  was  set,  that  is,  the  referring  object,  is  recorded 
in  a  table,  called  the  remembered  set.  When  NewSpace  is  full,  a  copy¬ 
ing  garbage  collection  is  performed  from  NewSpace  and  PastSurvivorSpace 
to  FutureSurvivorSpace,  using  as  the  root  set  the  references  to  NewSpace 
and  PastSurvivorSpace  from  the  objects  in  the  remembered  set.  FutureSur¬ 
vivorSpace  and  PastSurvivorSpace  are  then  exchanged. 

Each  object  has  associated  with  it  a  generation  count;  when  an  object  has 
survived  a  certain  number  of  garbage  collections,  it  is  copied  into  OldSpace, 
rather  than  PastSurvivorSpace. 
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2.1.2  Address  Space  Utilization 

Ungar’s  generation- scavenging  garbage  collector,  like  that  of  Ballard  and 
Shirron  [2],  on  which  it  is  based,  is  capable  of  utilizing  more  than  half  of 
its  address  space  at  any  time,  as  only  FutureSurvivorSpace  remains  unused 
during  a  computation.  Because  the  two  survivor  spaces  are  much  smaller 
than  either  NewSpace  or  OldSpace,  a  large  percentage  of  the  address  space 
is  in  use  during  user  computations. 


2.1.3  Suitability 

There  is  an  obvious  problem  with  the  use  of  an  approach  that  requires  the 
storage  of  generation  counts.  LISP  systems  on  general-purpose  computers 
usually  have  highly-optimized  storage  formats;  thus  a  cons,  for  example,  will 
consist  of  exactly  two  pointers,  with  no  space  for  a  generation  count.  This 
is  not  nearly  such  a  problem  in  Smalltalk  systems,  where  dynamic  objects 
are  usually  vector-structured,  with  object  references  to  the  object’s  class, 
instance  variables,  etc.;  but  in  high-performance  LISP  systems  we  would 
probably  have  to  store  object  generation  counts  externally. 

Suppose  we  were  to  store  four  bits  of  age  information  for  each  object.  In 
order  to  memory-map  these,  we  would  need  to  allocate  four  bits  of  storage 
for  every  sixty-four  bits  stored  in  the  ephemeral  spaces,  as  this  is  the  greatest 
common  denominator  of  LISP  object  sizes  on  thirty-two  bit  machines.  This 
gives  some  6.25%  additional  storage  required,  in  addition  to  that  required 
for  structures  used  to  record  the  remembered  set.  We  should  like  to  avoid 
this  overhead  if  at  all  possible. 

2.2  The  Tektronix  Large  Object  Space  Smalltalk  Garbage 
Collector 

2.2.1  Description 

The  Tektronix  Large  Object  Space  Smalltalk  implementation  [4]  includes  a 
generation-scavenging  garbage  collector  that  differs  from  Ungar’s  mainly  in 
the  organization  of  memory.  Memory  is  divided  into  seven  regions,  each  of 
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which  consists  of  two  semispaces.  Objects  still  include  generation  counts; 
during  a  garbage  collection,  they  are  copied  from  one  semispace  to  the  other, 
and  their  generation  counts  are  incremented.  When  an  object’s  generation 
count  reaches  some  preset  value,  it  is  advanced  to  the  next  region. 

Garbage  collection  is  stop-and-copy,  motivated  by  the  filling  of  a  region. 
Stores  to  the  stack  are  not  recorded;  rather,  the  stack  is  scanned  on  each 
garbage  collection.  Remembered  set  tables,  as  used  in  Ungar’s  Berkeley 
Smalltalk  garbage  collector,  are  maintained  for  each  region,  in  order  to  limit 
the  size  of  the  root  set.  These  tables  contain  only  pointers  forwards  in  time; 
that  is,  pointers  from  older  objects  to  newer  objects.  As  in  Lieberman  and 
Hewitt’s  garbage  collector,  the  entirety  of  each  younger  region  is  used  as 
part  of  the  root  set  whenever  an  older  region  is  garbage- collected. 


2.2.2  Address  Space  Utilization 

As  in  Cheney  or  Baker’s  garbage  collectors,  during  user  computations  the 
Tektronix  garbage  collector  uses  half  of  its  address  space  for  empty  semi- 
spaces. 


2.2.3  Suitability 

We  anticipate  problems  with  using  in  LISP  systems  a  garbage  collection 
scheme  like  that  used  in  Tektronix’s  Large  Object  Space  Smalltalk.  One 
problem  is  that  of  recording  generation  counts;  we  discussed  this  in  con¬ 
sidering  Ungar’s  generation-scavenging  garbage  collector,  in  Section  2.1.3, 
above.  Furthermore,  as  discussed  in  Section  1.2.2  above,  the  organization  of 
memory  into  semispaces  should  result  in  poorer  virtual  memory  performance 
than  we  might  hope  for. 
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2.3  The  Symbolics  Ephemeral  Garbage  Collector 

2.3.1  Description 

The  Symbolics  ephemeral  garbage  collector  uses  an  allocation  scheme  dif¬ 
ferent  from  that  of  either  the  Berkeley  Smalltalk  or  the  Tektronix  Smalltalk 
garbage  collectors.  Memory  is  divided  into  areas.  The  user  may  specify  for 
each  area  the  number  of  ephemeral  levels,  the  capacity  of  the  youngest  level, 
and  the  ratio  of  the  capacity  of  each  succeeding  level  to  that  of  the  youngest 
level. 

Ephemeral  levels  are  not  divided  into  semispaces;  rather,  when  a  level  is 
garbage-collected,  live  objects  within  it  are  copied  into  the  next  older  level. 
As  in  Lieberman  and  Hewitt’s  garbage  collection  algorithm  [9],  ephemeral 
garbage  collection  is  incremental,  and  proceeds  in  parallel  with  user  compu¬ 
tation.  Ephemeral  levels  are  garbage-collected  independently  of  each  other; 
pointers  ‘backwards  in  time’  (see  Section  1.2.1)  are  recorded  by  the  same 
means  as  are  other  pointers. 

The  capacity  of  an  ephemeral  level  is  the  number  of  words  that  may  be 
allocated  in  it  before  a  garbage  collection  is  initiated.  A  garbage  collection 
is  motivated  when  the  youngest  level’s  capacity  is  exceeded;  this  causes  its 
contents  to  be  garbage-collected  into  the  next  oldest  level.  If  that  level’s 
capacity  is  exceeded,  its  contents  are  copied  into  the  following  level;  objects 
that  survive  through  all  the  levels  are  advanced  to  dynamic  space.  Storage  in 
dynamic  space  is  recovered  with  a  separate  Baker-style  incremental  garbage 
collector. 

The  scheme  used  for  recording  the  root  set  is  different  from  those  of  either 
of  the  Smalltalk  implementations;  instead  of  maintaining  remembered  sets 
(or  entry  tables  in  the  sense  of  Liebermann  and  Hewitt’s  original  paper, 
which  the  3600  could  implement  because  of  its  invisible  pointer  hardware), 
the  Symbolics  scheme  instead  maintains  one  mark  bit  per  page  for  each 
ephemeral  level.  It  detects  when  a  pointer  into  an  ephemeral  level  is  being 
stored,  and  marks  the  page  in  which  the  reference  occurred.  When  an 
ephemeral  level  is  scavenged,  marked  pages  are  scanned  for  references  to 
that  level,  and  such  references,  if  found,  are  used  as  roots. 

Note  that  the  mark  bits  resemble  the  dirty  bits  maintained  by  virtual  mem- 
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ory  systems;  but,  unlike  dirty  bits,  they  are  only  set  when  the  word  stored  is 
actually  a  pointer  to  an  ephemeral  level.  No  great  harm  is  done  if  the  word 
stored  was  not  a  pointer  to  an  ephemeral  level,  though,  and  sometimes  at 
scavenging  time  the  mark  bit  will  be  set  for  a  page  that  contains  no  refer¬ 
ences  to  ephemeral  objects,  as  a  word  that  is  not  a  pointer  to  an  ephemeral 
object  may  overwrite  all  pointers  to  ephemeral  objects  on  a  page  without 
causing  the  mark  bit  to  be  reset.  Shaw  [12]  has  exploited  this  similarity  to 
a  table  of  dirty  bits  in  his  lifetime-based  garbage  collector. 

Scanning  the  entirety  of  each  marked  page  may  sound  wasteful,  but  the  3600 
has  hardware  to  assist  in  the  detection  of  references  to  ephemeral  levels,  and 
so  a  256-word  page  with  no  such  references  is  scanned  in  85  microseconds 
[10,  page  242]. 

The  scheme  used  in  the  Symbolics  ephemeral  garbage  collector  for  recording 
pointers  into  ephemeral  spaces  is  in  fact  more  complex  than  is  implied  by 
the  short  description  above.  Two  tables  are  actually  maintained;  one,  called 
the  Garbage  Collector  Page  Tags  (GCPT),  holds  a  bit  for  each  of  the  pages 
present  in  physical  memory.  Only  one  bit  is  stored,  and  thus  no  informa¬ 
tion  about  which  ephemeral  level  the  pointers  in  the  page  may  point  to  is 
recorded;  the  bit  says  only  that  a  pointer  to  an  ephemeral  level  may  still  be 
present  on  the  page. 

Another  table,  called  the  Ephemeral  Space  Reference  Table  (ESRT),  is 
stored  sparsely,  and  contains  for  each  swapped-out  page  a  bit  for  each  ephe¬ 
meral  level;  the  bit  is  set  to  indicate  that  the  page  contains  a  reference  to 
the  level  in  question.  The  table  is  maintained  entirely  in  physical  memory, 
and  allows  the  garbage  collector  to  determine  as  it  garbage-collects  a  level 
whether  to  scan  a  page;  because  scanning  a  page  requires  fetching  it  from 
backing  store,  the  maintenance  of  per-level  information  saves  page  faults. 
Clearly,  the  ESRT  must  be  updated  whenever  a  page  is  ejected  from  physi¬ 
cal  memory. 

Given  that  the  intention  is  to  avoid  any  needless  scanning  of  pages  residing 
on  backing  store,  how  must  the  ESRT  be  updated  when  pages  are  ejected 
from  physical  memory?  If  the  page  has  not  been  written  at  all,  its  ESRT 
entry  (if  extant)  need  not  be  updated.  If  the  page  has  been  written,  and 
already  has  an  ESRT  entry,  then  the  ESRT  entry  must  be  updated  regardless 
of  the  setting  of  the  page’s  GCPT  bit,  because  it  is  possible  that  data  written 
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into  it  overwrote  all  the  page’s  pointers  to  ephemeral  objects.  If  the  page 
has  no  ESRT  entry,  then  it  need  only  be  scanned  if  its  GCPT  entry  is  set; 
only  then  might  one  have  to  create  an  ESRT  entry  for  it. 


2.3.2  Implementing  the  Symbolics  Ephemeral  Garbage  Collector 
Without  Special-Purpose  Hardware 


Suppose  one  wished  to  implement  a  garbage  collector  similar  to  the  Symbol¬ 
ics  Ephmeral  Garbage  Collector  on  a.  general-purpose  computer  in  which  one 
had  the  cooperation  of  the  virtual  memory  system.  It  will  be  immediately 
apparent  that  the  technique  of  scanning  pages  makes  use  of  the  3600 ’s  fully 
tagged  architecture.  Because  each  word  has  a  tag  that  tells  whether  it  is  a 
pointer,  one  may  begin  scanning  memory  in  the  midst  of  an  object,  without 
risk  of  interpreting  non-pointer  data  (such  as,  say,  collections  of  characters 
in  a  string)  as  pointers. 

On  a  general-purpose  machine,  one  would  need  to  use  some  other  method 
of  determining  whether  words  examined  were  pointers.  One  possibility  is 
to  divide  all  data  into  two  types:  those  containing  pointers  and  those  not 
containing  pointers.  This  is  certainly  possible  using  the  data  formats  in 
(for  example)  Lucid  Common  LISP,  but  there  is  a  price  to  be  paid.  Storage 
management  is  complicated  by  the  addition  of  a  parallel  set  of  storage  spaces. 
Locality  of  reference  is  degraded. 

Another  possibility  is  to  maintain  in  a  table  an  entry  for  each  page  giving  the 
offset  from  the  base  of  the  page  to  its  first  tagged  word.  A  slight  amount  of 
overhead  at  object  creation  time  suffices  to  maintain  the  table  entries.  In  the 
data  formats  used  by  Lucid  Common  LISP,  and  a  number  of  other  modern 
LISP  implementations  for  general-purpose  computers,  untagged  words  never 
follow  tagged  words  without  object  headers  appearing  in  between,  because 
otherwise  the  linear  scan  of  a  space  used  as  the  root  set  in  copying  garbage 
collection  would  be  impossible.  The  overhead  for  maintaining  a  memory- 
mapped  table  of  such  offsets  on  a  32-bit  computer  with  256-word  pages  is  8 
bits  per  256  words,  or  about  0.1%;  this  is  certainly  not  a  deleterious  amount 
of  storage. 

We  have  surmounted,  then,  the  problems  for  garbage  collection  associated 
with  maintaining  untagged  words  in  the  system;  what  about  maintaining 
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the  GCPT?  As  the  discussion  above  suggests,  rather  than  maintaining  a 
GCPT,  we  may  simply  use  the  table  of  dirty  bits  in  the  virtual  memory 
system.  When  it  is  time  to  perform  a  garbage  collection,  we  scan  all  the 
pages  whose  dirty  bits  are  set;  because  these  pages  are  necessarily  all  in 
physical  memory,  this  operation  should  not  take  long  compared  to  the  time 
required  to  fetch  pages  from  backing  store.  Exactly  how  long  might  it  take 
to  scan  a  page?  Figure  1,  on  page  13,  gives  the  MC68020  code  and  timing 
to  scan  a  256-word  page. 

The  common  path  through  the  loop,  in  the  cache  case,  is  some  51  cycles; 
on  an  MC68020  clocked  at  16  megahertz,  this  corresponds  to  some  816  mi¬ 
croseconds  for  a  256-word  page.  The  immediate  tagged  case  is  not  included; 
it  would  involve  checking  the  high  five  bits  of  the  low  byte  to  determine  the 
object  type.  The  immediate  tagged  case  will  often  (in  the  case  of  vectors  of 
untagged  data,  for  example)  lead  to  the  skipping  of  some  number  of  words 
of  untagged  data.  This,  and  the  fact  that  it  is  reasonable  to  expect  that  the 
loop  will  reside  entirely  in  the  MC68020’s  256-byte  on-chip  instruction  cache, 
give  us  some  confidence  that  816  microseconds  is  a  conservative  estimate. 

Here  is  where  special-purpose  hardware  comes  into  its  own,  then;  the  MC68020 
takes  nearly  ten  times  as  long  to  perform  a  page  scan  as  does  the  3600.  How 
significant  is  this  page-scanning  time?  The  difference  between  the  two  is 
some  731  microseconds,  but  this  does  not  tell  nearly  the  whole  story,  as  the 
3600’s  GCPT  bits  are  unhke  dirty  bits  in  that  they  are  only  set  when  the 
word  stored  is  actually  a  pointer  to  an  ephemeral  level.  A  general-purpose 
computer  does  not  perform  tag  or  boundary  checks,  so  many  pages  in  the 
LISP  process’s  address  space  will  have  their  dirty  bits  set  without  point¬ 
ers  to  ephemeral  levels  actually  having  been  stored  in  them.  These  pages 
will  be  scanned  needlessly  at  every  garbage  collection  (see  note  A. 2  for  a 
comparison  with  the  scheme  described  by  Shaw  [12]). 

Metering  would  allow  us  to  determine  how  many  pages  would  actually  be 
found  needlessly  dirty  in  typical  LISP  applications;  however,  even  without 
such  measurements,  we  can  envision  a  worst  case.  Let  us  consider  the  per¬ 
formance  of  the  Symbolics  ephemeral  garbage  collector  on  a  3600  with  1 
megaword  of  physical  memory  [10,  table  2,  page  244].  When  running  the 
BOYER  benchmark,  an  average  of  11.5  seconds  were  taken  for  each  flip; 
when  running  the  compiler  benchmark,  the  figure  was  1.6  seconds. 
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;;;  First  tagged  word  is  in  aO;  segment  table  in  al,  word 
;;;  after  last  on  page  in  a2.  Parenthesized  numbers  after 
;;;  instructions  give  best  case,  cache  case,  and  worst  case 
;;;  timings,  as  per  MC68020  User’s  Manual, 
scanlp:  cmp.l  aO,  a2  ;done  yet?  (144) 

;;  branch  when  done  scanning,  ge  because  untagged 
;;  structure  may  cross  page  boundary, 
bge  scandn  ;(1  4  5)  (branch  not  taken) 

movG.l  (a0)+,  do  ;d0  holds  pointer  (4  6  7) 

;;  fetch  tag  bits;  bashes  low  byte 
andi.b  #$07,  dO  ; (0  2  3)  +  (0  2  3) 

; ;  see  if  this  might  be  a  header 
cmpi.b  #dtp-other-immediate,  dO 

;(0  2  3)  +  (0  2  3) 

;;  handle  if  so;  this  is  also  the  case  for  tagged 
;;  characters  and  small  floats  (i.e.,  not  in  strings 
; ;  or  vectors) 

beq  check-header  ;(1  4  5)  (branch  not  taken) 

; ;  now  want  only  low  two  bits  of  tag 
andi.b  #$03,  dO  ;(0  2  3)  +  (0  2  3) 

; ;  low  bits  are  0  if  f ixnum 

beq  scanlp  ;(1  4  5)  (branch  not  taken) 

;;  if  we  got  this  far,  this  is  a  pointer.  Fetch 
;;  segment  index.  Note  high  part  still  valid, 
swap  dO  ;(1  4  4) 

; ;  does  the  pointer  point  into  an  ephemeral  segment? 
cmpi.b  # ephemeral - s egment ,  (dO,  al) 

;(0  2  3)  +  (356) 

;;  no,  continue 

bne  scanlp  ; (3  6  9)  (branch  taken) 

Best  Case  Cache  Case  Worst  Case 

Totals:  15  51  66 

Figure  1:  Page-scanning  on  the  MC68020. 
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On  our  general-purpose  computer,  if  every  one  of  1024  pages  were  found 
dirty  at  garbage  collection  time,  but  contained  no  ephemeral  references, 
then  there  might  be  some  2.9  seconds  of  overhead  per  garbage  collection. 
This  is  25.2%  of  the  BOYER  benchmark  time,  and  153%  of  the  compiler 
benchmark  time.  I  stress  that  this  is  the  worst  possible  case,  with  every 
page  in  physical  memory  found  dirty  at  garbage  collection  time.  Y'e  expect 
that  in  real  applications  a  much  smaller  proportion  of  the  pages  would  be 
found  dirty  (many  of  them,  for  example,  might  contain  pure  code),  and,  of 
those  found  dirty,  many  would  not  need  to  be  scanned  (for  example,  those 
in  the  youngest  ephemeral  level,  or  in  the  unscanned  spaces). 

Consider  now  the  time  required  to  update  the  ESRT.  On  our  general-purpose 
computer,  we  use  the  virtual  memory  system’s  table  of  dirty  bits  for  the 
GCPT,'so  that,  if  we  wish  to  ensure  that  ESRT  entries  are  always  correct 
for  pages  on  backing  store,  we  must  scan  the  page  and  potentially  update  or 
create  an  ESRT  entry  whenever  the  ejected  page’s  dirty  bit  is  set.  The  case 
where  the  3600’s  special-purpose  hardware  will  help  it  is  that  where  a  word 
was  written  into  a  page,  but  the  word  was  not  a  pointer  to  an  object  in  an 
ephemeral  space.  But  here  we  assume  that  the  virtual  memory  system  on 
our  machine  has  given  the  garbage  collector  a  trap  after  initiating  a  seek  on 
the  disk;  the  scan  time  is  easily  entirely  subsumed  in  the  average  seek  time 
of  any  modern  disk  drive,  just  as  on  the  3600. 

Thus  the  3600’s  special-purpose  hardware  does  indeed  help  it,  but  not  nearly 
so  much  as  one  might  think.  Given  our  belief  that  the  average  overhead 
for  page-scanning  on  stock  hardware  will  not  account  for  most  of  the  time 
spent  in  garbage  collection,  we  believe  that  with  a  slightly  faster  processor, 
such  as  an  MC68020  clocked  at  25  megahertz,  the  3600’s  speed  advantage 
will  disappear.  Note  that  the  time  estimates  given  are  for  time  spent  in 
the  garbage  collector;  the  overhead  required  at  user  program  run  time  to 
keep  track  of  ephemeral  objects,  estimated  by  Moon  as  ‘at  least  10%  and 
possibly  a  factor  of  two  or  more,’  [10,  page  243]  is  actually  limited  to  the 
nearly  insignificant  time  added  to  object  creation  in  order  to  update  the 
table  of  offsets  to  the  first  tagged  object  in  each  page  -  provided  one  has 
the  collusion  of  the  virtual  memory  system. 
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2.3.3  Address  Space  Utilization 

Both  the  Symbolics  Ephemeral  Garbage  Collector  and  the  stock-hardware 
derivative  we  propose  here  copy  objects  from  one  ephemeral  level  to  the  next; 
thus  the  pages  used  for  the  creation  of  new  objects  are  the  same  before  and 
after  a  garbage  collection.  We  expect  good  virtual  memory  performance 
from  this  scheme. 


2.3.4  Suitability 

Moon’s  ephemeral  garbage  collector  does  not  lend  itself  to  easy  criticism. 
While  his  paper  does  not  anticipate  the  implementation  of  his  scheme  on 
general-purpose  computers,  and,  indeed,  discounts  the  idea,  the  author 
feels  that,  given  the  cooperation  of  the  virtual  memory  system.  Moon’s 
garbage  collection  scheme  would  perform  quite  well  on  a  general-purpose 
computer.  General-purpose  computers  are  at  a  disadvantage  in  the  task  of 
page-scanning,  but  even  inefficient  scanning  of  pages  is  a  small  price  to  pay 
to  escape  the  need  to  explicitly  record  using  additional  software  the  storage 
of  pointers  into  ephemeral  levels. 
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Part  II 

A  Lifetime-based  Garbage 
Collector  for  LISP  Systems  on 
General-Purpose  Computers 

In  what  follows,  I  use  Moon’s  terminology.  I  call  objects  that  have  not 
yet  been  moved  to  spaces  intended  to  hold  relatively  permanent  objects 
ephemeral  objects.  Moving  an  object  from  a  space  that  holds  newer  objects 
to  a  space  that  holds  older  objects  is  called  advancement.  The  several  spaces 
that  hold  ephemeral  objects,  organized  so  that  the  ages  of  the  objects  within 
them  vary  monotonically,  are  referred  to  as  ephemeral  levels.  The  ephemeral 
level  holding  the  youngest  objects  is  called  either  the  youngest  ephemeral 
level  or  the  first  ephemeral  level;  that  holding  the  oldest  objects  is  called 
either  the  oldest  ephemeral  level  or  the  last  ephemeral  level. 


3  Desiderata 

A  number  of  constraints  are  forced  by  the  use  of  general-purpose  computers, 
and  by  the  desire  to  write  a  portable  garbage  collector;  that  is,  one  that  does 
not  require  the  cooperation  of  the  virtual  memory  system.  A  summary  of 
the  design  goals  follows: 

•  Performance.  We  wanted  each  garbage  collection  to  be  fast;  that  is,  we 
wanted  to  minimize  the  amount  of  computation  required  for  a  garbage 
collection,  so  that  the  ephemeral  garbage  collector  would  be  suitable 
for  use  in  interactive  systems. 

We  also  hoped  not  to  unduly  slow  down  user  code.  Because  we  do  not 
have  the  cooperation  of  the  virtual  memory  system,  pointer-settings 
must  be  recorded  explicitly,  so  that  some  slowdown  is  inevitable  in 
execution  time  of  code  that  does  not  allocate  storage,  and  thus  does 
not  cause  garbage  collections.  We  wished  to  minimize  this  slowdown. 
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In  code  that  creates  and  discards  many  objects,  like  the  BOYER 
benchmark  [7],  we  wished  to  realize  overall  performance  improvements. 

•  Portability.  The  design  was  not  to  require  cooperation  from  the  virtual 
memory  system,  nor  was  it  to  be  tied  to  a  particular  architecture.  It 
also  had  to  be  easy  to  retrofit  into  existing  LISP  implementations  for 
various  machines. 

•  Predictability.  Many  users  of  LISP  carefully  code  their  programs  to 
avoid  any  object  creation,  so  that  no  unexpected  delays  will  occur;  for 
example,  a  robot  control  program  cannot  afford  even  a  20-miIlisecond 
delay.  Programs  that  do  not  create  objects  should  not  cause  garbage 
collections,  or  be  subjected  to  unexpected  delays,  as  for  reorganization 
of  internal  tables. 

•  Flexibility.  We  wanted  the  ephemeral  garbage  collector  to  be  tunable; 
the  number  of  levels  and  their  sizes  were  to  be  easily  modifiable,  be¬ 
cause  the  parameter  settings  for  best  performance  were  likely  to  vary 
between  applications. 

•  Robustness.  The  scheme  we  selected  was  not  to  be  prone  to  fail¬ 
ure  during  garbage  collection.  We  wished  to  avoid  schemes  that  could 
conceivably  run  out  of  memory  when  advancing  objects  from  one  ephe¬ 
meral  level  to  the  next. 
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4  Early  Decisions 


With  the  goals  discussed  in  the  previous  section  in  mind,  it  was  possible 
to  make  many  design  decisions  before  committing  to  a  specific  scheme  for 
recording  pointers.  These  decisions  are  discussed  below. 


4.1  Optimizing  the  Task  of  Keeping  Track  of  Ephemeral  Ob¬ 
jects 

Without  special  support  from  the  virtual  memory  system,  the  greatest 
source  of  inefficiency  in  lifetime-based  garbage  collection  systems  on  general- 
purpose  computers  is  the  recording  of  pointers  into  ephemeral  spaces.  This 
recording  must  be  performed  in  software,  replacing  what  was  formerly  one 
instruction;  this  increases  the  size  of  the  compiled  code  image,  even  if  an 
out-of-line  call  is  performed,  and  has  a  varying,  but  always  negative,  effect 
upon  performance,  dependent  upon  the  dynamic  frequency  of  pointer  stores. 

This  effect  is  manifested  most  strongly  in  high-performance  systems  with 
native  code  compilation;  it  is  not  nearly  so  much  a  problem  in,  for  example, 
Ungar’s  Berkeley  Smalltalk  system  (discussed  in  Section  2.1,  above),  because 
this  system  utilizes  a  byte-code  interpreter  that  executes  only  some  9,000 
instructions  per  second.  In  [10,  page  246],  Moon  makes  the  point  that  the 
performance  of  Ungar’s  generation-scavenging  looked  good  because  Berkeley 
Smalltalk  takes  about  50  machine  instructions  to  do  a  store;  the  overhead  of 
adding  an  object  to  the  remembered  set  is  not  overwhelming  by  comparison. 

One  somewhat  ameliorating  factor  is  the  possibility  of  performing  certain 
compile-time  optimizations;  as  noted  below,  the  Lucid  Common  LISP  com¬ 
piler  does  in  fact  perform  these  optimizations  for  the  benefit  of  the  lifetime- 
based  garbage  collection  scheme  we  implemented.  The  compiler  optimizes 
out  pointer-recording  when  the  pointer  being  stored  is  a  constant  immediate 
quantity,  such  as  a  character  or  small  integer,  or  points  to  a  constant  static 
entity,  such  as  a  symbol. 

One  more  significant  optimization  is  performed;  when  the  current  object 
creation  area  is  the  youngest  ephemeral  level,  the  object-creation  subroutines 
used  do  not  record  storage  of  initial  values  in  newly-created  objects,  as  these 
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will  have  been  created  in  the  youngest  ephemeral  level,  and  any  pointers  from 
them  will  be  pointers  ‘backwards  in  time,’  to  use  Lieberman  and  Hewitt’s 
terminology. 


4.2  Set- Associative  Pointer-Recording 

One  possible  pointer-recording  scheme  would  use  a  set-associative  table  of 
locations  holding  pointers  to  ephemeral  spaces. 

Suppose  we  maintained  a  set-associative  table  of  256  lines,  with  16  one-word 
entries  per  line,  for  each  ephemeral  level;  this  gives  some  16  kilobytes  of  table 
per  ephemeral  level.  Assnme  further  that  we  worried  only  abont  updating 
the  table  for  pointer  settings;  that  we  did  not  worry  about  removing  entries 
in  the  table  for  pointers  that  were  written  over.  When  any  line  in  the  table 
was  completely  full,  we  could  use  one  of  several  (expensive)  strategies  to 
reduce  the  problem;  possibly  we  could  begin  allocating  in  the  ephemeral 
level  in  question  a  list  of  pointers  into  it,  or  perform  a  scavenge  of  the  level 
in  question  into  the  next  older  level,  in  which  case  we  would  need  to  add  to 
that  level’s  table  only  the  references  that  were  not  in  that  level,  and  we  would 
likely  have  enough  space  for  them;  etc.  But  leaving  aside  the  question  of 
dealing  with  entirely  full  lines,  we  depict  in  Figure  2  an  MC68020  instruction 
sequence  that  performs  a  pointer-setting  when  using  a  set-associative  table 
for  recording  pointers  into  ephemeral  levels.^ 

The  common  case  (set  a  pointer  from  the  current  ephemeral  level  to  the 
current  ephemeral  level)  requires  nine  instructions,  not  counting  the  pointer 
setting.  Making  the  table  entry  requires  ten  instructions  in  the  case  where 
the  first  entry  examined  is  empty,  and  five  more  instructions  per  time  around 
the  loop.  This  sequence  of  instructions  is  so  large  as  to  mandate  an  out-of¬ 
line  subroutine  call;  this  would  entail  further  overhead  at  runtime.  Note  also 
that  we  are  making  free  use  of  two  address  registers  beyond  those  holding 
the  pointer  and  the  destination,  and  three  data  registers;  thus  the  compiler 
will  have  fewer  registers  at  its  disposal,  with  the  attendant  negative  impact 
on  efficiency  of  surrounding  code. 

^We  do  not  move  using  the  MC68020  memory  indirect  post-indexed  addressing  mode 
(although  it  would  save  one  instruction),  as  it  is  slower  than  the  combinations  of  instruc¬ 
tions  we  do  use. 
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;;  on  entry,  aO  holds  the  destination,  and  al  the 
;;  pointer.  Fetch  byte  table  of  ephemeral  levels 
;;  from  systemic  quantities  vector  (SQ) ,  which  is 
; ;  held  in  an  address  register, 
move.l  (elevel,  SQ) ,  a2 

; ;  prepare  to  fetch  segment  index  of  destination 
move . 1  aO ,  dO 

swap  dO  ;get  segment  index  in  low  16 

; ;  store  in  dl  ephemeral  level  of  destination 
move.b  (0,  a2,  dO.w),  dl 
move.l  al ,  dO 
swap  dO 

; ;  d2  gets  ephemeral  level  of  pointer 
move.b  (0,  a2,  dO.w),  d2 

; ;  now  we  compare  ephemeral  levels  to  see  if  we  need 
; ;  to  make  a  recording  table  entry, 
cmp.b  dl,  d2 

bne  hktabl  ; different  levels,  make  entry 

;;  saime,  just  set  pointer  (most  common  case) 

(set  pointer  and  exit) 

; ;  fetch  table  of  tables  from  SQ 
hktabl:  move.l  (extbls,  SQ) ,  a2 

; ;  index  by  ephemeral  level  of  pointer 
move.l  (0,  a2,  d2.w),  a2 

move.l  aO,  dO  ;get  dest  in  data  register  again 

;;  lines  are  at  64-byte  intervals;  want  bits  14-0 

; ;  with  low  6  cleared  for  index  of  our  line.  Get 

; ;  line  index  in  dO . 

ori.l  #$00003FC0,  dO 

add.l  dO,  a2  ;line  address  in  a2 

;;  16  entries  per  line,  but  testing  at  bottom 

move.l  #15,  dl 

;;  find  entry  that’s  empty  or  same 
loop:  move.l  (a2)+,  a3 

beq  found  ;0  is  empty  entry 

cmp.l  aO,  a3  ;if  same,  done,  go  to  set it 
beq  setit 
dbra  dl ,  loop 

; ;  no  empty  entry  in  this  line 

(deal  with  the  problem,  possibly  by  creating  an 
extension  to  the  line) 

found:  move.l  aO,  -(a2)  :make  entry 
bra  setit  ;go  set  pointer 

Figure  2:  MC68020  code  to  record  ephemeral  reference  locations  in  a  set- 
associative  table. 
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This  scheme  does  have  the  advantage  of  compactness  of  representation  of 
the  recorded  information,  but  has  t-wo  major  disadvantages.  The  first  is  that 
the  number  of  instructions  executed  to  set,  for  example,  a  special  variable, 
is  at  least  nineteen,  and  likely  more.  The  second  disadvantage  is  that  it 
is  not  clear  how  to  proceed  when  a  line  is  filled;  the  delays  necessary  for 
compaction  or  garbage  collection  might  be  too  large.  Two  design  goals, 
performance  and  predictability,  are  violated;  the  scheme  was  not  considered 
further. 


4.3  Avoiding  the  Overhead  of  Determining  Spaces  When 
Storing  Pointers 

We  noted  that  set- associative  pointer-recording  had  two  distinct  disadvan¬ 
tages;  the  second  had  to  do  exclusively  with  the  structures  used  for  recording 
the  storage  of  pointers  into  ephemeral  levels,  but  the  first  disadvantage  lay 
partly  in  the  expense  of  that  recording,  and  partly  in  the  expense  of  deter¬ 
mining  the  spaces  for  a  pointer  being  stored  and  the  location  it  is  stored 
in. 

On  the  3600,  when  a  word  is  stored  into  memory,  it  is  examined  (in  parallel 
with  the  memory  access)  to  see  if  it  is  a  reference  to  an  ephemeral  area  being 
stored  into  either  another  ephemeral  area,  or  into  a  non-ephemeral  area;  if 
it  is,  the  fact  is  recorded  by  setting  a  bit  in  the  GCPT.  Moon  states  that  the 
reason  custom  hardware  is  required  to  implement  a  lifetime-based  garbage 
collector  is  that  this  examination  would  have  to  be  performed  in  software 
on  a  general-purpose  machine,  and  would  take  between  2.5  and  25  microsec¬ 
onds.  As  we  saw  in  Figure  2,  which  showed  MC68020  code  for  maintaining 
set-associative  tables  of  pointers  into  ephemeral  spaces  (the  determination 
of  spaces  would  be  the  same),  it  would  also  require  the  use  of  several  reg¬ 
isters,  thus  slowing  down  execution  of  the  surrounding  code.  Finally,  the 
nine  instructions  required  simply  to  determine  the  ephemeral  levels  of  the 
pointer  and  the  location  in  which  it  is  stored,  before  ever  recording  its  stor¬ 
age,  would  have  a  serious  impact  upon  performance  of  code  that  did  not 
garbage  collect. 

What  we  wish  to  do  is  move  some  of  the  overhead  of  this  operation  from 
pointer-setting  time  to  garbage  collection  time.  The  critical  portions  of  the 
garbage  collector  can  be  coded  in  assembly  language,  and  can  use  as  many 
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registers  as  necessary;  there  will  be  only  one  copy  of  this  code,  so  the  number 
of  instructions  used  will  not  be  critical  to  the  image  size,  as  it  would  be  if 
the  operation  were  being  coded  in-line  at  every  pointer  storage.  Two  other- 
important  points  are: 


•  Many  of  the  pointers  stored  during  the  execution  of  a  program  will 
be  stored  in  locations  in  the  youngest  ephemeral  level.  The  garbage 
collector  will  never  have  to  determine  the  ephemeral  level  to  which 
these  pointers  point,  because  all  pointers  stored  in  the  youngest  level 
are  pointers  backwards  in  time,  in  the  terminology  of  Lieberman  and 
Hewitt. 

•  In  many  programs,  a  single  set  of  locations  is  repeatedly  written.  If 
there  are  several  pointer  stores  to  a  single  location  between  garbage 
collections,  the  ephemeral  level  of  only  the  last  pointer  stored  matters; 
thus  the  work  done  to  determine  ephemeral  levels  in  the  other  pointer 
stores  is  wasted. 


These  considerations  suggest  that,  rather  than  determining  the  spaces  of  the 
pointer  and  the  location  it  is  stored  in  at  pointer-storage  time,  we  should 
adopt  some  sort  of  scheme  whereby  we  record  only  that  a  location  has  been 
modified,  and  postpone  until  garbage-collection  time  the  determination  of 
the  space  the  pointer  was  stored  in  and  the  space  it  pointed  into.  We  do 
not  even  attempt  to  determine  at  runtime  whether  the  pointer  is  actually 
an  immediate  constant,  such  as  a  character  or  fixnum.'^ 

In  using  this  scheme,  we  wiU  often  have  to  examine  at  garbage-collection  time 
locations  that  do  not  contain  ephemeral  references  at  all.  This  examination 
will  cost  very  little  if  the  page  containing  the  location  in  question  is  in  main 
memory;  if  it  is  on  backing  store,  the  cost  will  be  much  greater.  Our  hope  is 
that  the  technique  of  lifetime-based  garbage  collection  so  improves  locality  of 
reference  as  to  decrease  substantially  the  number  of  cases  where  the  working 
set  exceeds  the  available  physical  memory.® 

^Although,  as  mentioned  in  Section  4.1  above,  compile-time  optimizations  may  be 
exploited. 

'’Measurements  of  the  number  of  pointer  stores  recorded,  but  not  containing  epheme¬ 
ral  references,  during  the  execution  of  various  symbolic  processing  tasks  would  be  useful 
in  evaluating  this  scheme,  as  would  a  measurement  of  the  number  of  pages  containing 
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4.4  The  Organization  of  Ephemeral  Spaces  in  Memory 

We  concluded  that  the  maintenance  of  generation  counts  was  undesirable  in 
LISP  systems.®  Without  generation  counts,  we  have  two  alternatives  when 
performing  lifetime-based  garbage  collection: 


•  We  can  organize  each  ephemeral  level  into  semispaces,  and  copy  ob¬ 
jects  from  one  semispace  to  another  until  a  certain  number  of  garbage 
collections  have  been  completed.  This  gives  address  space  utilization 
much  like  that  of  Tektronix’s  Large  Object  Space  Smalltalk  garbage 
collector. 

•  We  can  garb  age- collect  each  ephemeral  level  into  the  next  older  level, 
as  in  the  Symbolics  ephemeral  garbage  collector. 


We  concluded  that  the  first  alternative  is  more  likely  to  result  in  poor  virtual 
memory  performance  than  the  second;'  thus  we  chose  to  use  a  division  of 
memory  into  successive  ephemeral  levels,  each  of  which  is  garbage-collected 
into  the  next  older  level. 


4.4.1  Pointers  Backwards  in  Time 

Lieberman  and  Hewitt’s  garbage  collector  (discussed  in  Section  1.2.1)  recorded 
only  ‘pointers  forwards  in  time,’  that  is,  those  from  either  non-ephemeral 
spaces  to  ephemeral  spaces  or  from  older  ephemeral  levels  to  younger  ephe¬ 
meral  levels.  Thus  the  garbage  collection  of  a  level  required  scanning  all 
younger  levels  as  members  of  the  root  set;  otherwise  pointers  ‘backwards  in 
time,’  from  younger  levels  to  older  levels,  would  not  be  updated  to  point  to 
the  newly  copied  referents,  and  might  possibly  lose  their  referents  altogether. 

Moon’s  ephemeral  garbage  collector  is  also  incremental,  and  can  scavenge 
several  levels  at  once  while  running  user  code.  His  solution  is  to  record 

recorded  locations  ejected  to  backing  store  between  ephemeral  garbage  collections  on  sys¬ 
tems  with  varying  amounts  of  memor}'.  These  and  other  recommendations  for  future 
analysis  are  described  in  Section  8. 

®See  Section  2.1.3. 

’^See  Sections  1.2.2  and  2.2.2. 
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pointers  backwards  in  time  by  the  same  means  as  pointers  forwards  in  time; 
because  his  representation  of  recording  information  is  very  compact,  this  is 
inexpensive. 

In  a  stop-and-copy  scheme,  user  code  may  not  run  until  a  garbage  collection 
has  completed.  When  we  have  finished  copying  from  a  younger  level  to  an 
older  level,  the  younger  level  will  be  empty.  If  the  older  level  is  now  full,  we 
may  scavenge  it  without  scanning  younger  levels,  as  these  will  be  empty. 


4.4.2  Implications  for  Memory  Organization 

This  ordering  of  events  allows  us  to  organize  our  ephemeral  spaces  in  a 
convenient  fashion.  Many  operating  systems  on  general-purpose  computers 
require  contiguous  memory  allocation;  thus,  if  garbage- collecting  an  ephe¬ 
meral  level  would  require  filling  the  next  level  beyond  its  capacity,  space 
must  be  set  aside  for  the  overflow.  Suppose  we  order  our  ephemeral  spaces 
as  depicted  in  Figure  layout.®  The  size  of  the  odd-level  overflow  segment 
pool  (OSP)  is  the  sum  of  the  sizes  of  all  the  even  levels  except  for  the  last 
ephemeral  level,  if  it  is  an  even  level;  similarly,  the  size  of  the  even  level  OSP 
is  the  sum  of  the  sizes  of  all  the  odd  levels  except  for  the  last  ephemeral  level, 
if  it  is  an  odd  level.  Thus  the  total  space  occupied  by  overflow  segments  is 
less  than  the  space  occupied  by  ephemeral  data. 

Suppose  we  perform  a  copying  garbage  collection  from  level  0  to  level  1,  and 
level  1  is  full,  and  all  data  in  level  0  are  retained;  we  may  allow  data  to 
overflow  from  level  1  into  the  odd-level  overflow  segment  pool.  Because  the 
size  of  the  odd-level  OSP  is  at  least  that  of  the  level  0,  we  are  guaranteed  that 
there  will  be  room  to  copy  the  data  from  level  0  into  level  1  and  the  odd-level 
OSP,  so  our  copying  garbage  collector  may  simply  continue  copying  past  the 
end  of  level  1  into  the  OSP.  When  we  garbage- collect  level  1  into  level  2, 
level  1  may  contain  as  much  data  as  level  1  and  level  0  put  together.  If  level 
2  were  also  entirely  filled,  we  are  guaranteed  room  to  complete  the  garbage 
collection,  because  level  I’s  size  was  included  in  the  size  of  the  even-level 
OSP,  and  level  0  is  now  empty,  so  we  simply  continue  copying  past  the  end 
of  level  1  into  level  0. 

®I  am  indebted  to  James  Boyce,  of  Lucid,  Inc.,  for  this  snggestion. 
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Figure  3:  Layout  of  ephemeral  spaces  and  overflow  segment  pools  in  Lucid 
Common  LISP. 
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Note  that  this  strategy  has  good  implications  for  virtual  memory  usage.  The 
pages  overflowed  into  are  pages  that  recently  held  other  ephemeral  data;  thus 
we  have  a  good  chance  that  they  will  still  be  present  in  physical  memory. 

The  garbage  collection  strategy  described  above,  in  which  no  ephemeral 
level  is  garbage-collected  until  all  younger  levels  have  been  emptied,  has  a 
number  of  implications.  First,  note  that,  occasionally,  a  garbage  collection 
will  require  garbage- collecting  all  the  ephemeral  levels.  The  defaults  on  the 
3600  are  to  use  semispaces  that  decrease  in  size  by  a  factor  of  two;  this  may 
or  may  not  reflect  something  like  the  “average”  persistence  of  objects,  but 
with  this  progression  of  sizes,  the  maximum  amount  of  work  required  to 
garbage  collect  m  levels  would  be  no  more  than  2vi  times  the  amount  of 
work  to  garbage-collect  level  0  (for  m  =  4  levels  the  factor  is  actually  about 
6).^ 

For  five  levels,  we  approach  an  order  of  magnitude;  one  risk  in  emptying 
younger  levels  before  garbage- collecting  older  levels,  then,  is  that  the  pauses 
for  garbage  collection  may  become  noticeable.  In  practice,  this  has  not  been 
a  problem.  A  more  serious  risk  lies  in  the  way  that  a  garbage  collection 
that  causes  the  objects  in  ephemeral  space  to  be  advanced  all  the  way  to 
dynamic  space  (as  some  garbage  collections  inevitably  do)  cannot  help  but 
advance,  at  the  same  time,  all  the  live  objects  that  were  in  the  youngest 
ephemeral  level  at  the  time  of  the  garbage  collection.  These  objects  may  in 
fact  become  garbage  very  soon  after  their  advancement;  they  have  had  less 
than  one  garbage-collection  period  in  which  to  mature  before  being  advanced 
to  dynamic  space.  However,  garbage  collections  that  proceed  all  the  way  to 
dynamic  space  are  much  more  rare  than  those  that  do  not;  we  do  not  expect 
this  “premature  tenuring”  (to  use  Ungar’s  term)  to  be  a  problem. 


4.5  Allocation  of  Very  Large  Objects 

Most  very  large  objects  have  long  lifetimes.  These  objects  may  be,  for 
example,  bitmaps  being  processed  by  image  understanding  programs,  or 
arrays  of  cellular  automata,  or  data  collection  buffers  for  input  devices. 
Sometimes  these  objects  are  so  large  that  they  exceed  the  capacity  of  the 
youngest  ephemeral  level,  which  is  typically  a  small  fraction  of  the  space 

®See  note  A. 3  for  a  precise  formulation. 
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allocated  to  the  process;  then  they  cannot  be  created  in  ephemeral  space 
at  all.  Some  other  very  large  objects  will  fit  in  the  youngest  ephemeral 
level,  but,  because  they  are  permanent,  will  simply  be  copied  on  succeeding 
garbage  collections  through  all  the  ephemeral  levels  until  they  are  finally 
advanced  to  dynamic  space. 

This  sort  of  successive  copying  is  inefficient;  we  would  like  to  provide  the 
user  with  a  means  of  causing  objects  larger  than  a  given  size  to  be  created  in 
dynamic  space.  This  is  easily  accomplished  with  a  user-modifiable  parameter 
that  is  checked  by  the  object  creation  routines. 

Allocating  objects  in  dynamic  space  when  the  normal  allocation  space  is 
ephemeral  space  will  eventually  residt  in  the  filling  of  dynamic  space,  and 
the  attendant  dynamic  garbage  collection;  and  there  will  be  objects  in  ephe¬ 
meral  space  at  this  point.  In  fact,  this  situation  is  not  unique  to  the  allo¬ 
cation  of  very  large  objects;  whenever  we  are  about  to  scavenge  the  oldest 
ephemeral  level  into  dynamic  space,  we  must  insure  that  there  is  enough 
room  in  dynamic  space  to  hold  its  contents.  The  allocation  of  this  space 
may  result  in  a  dynamic  garbage  collection,  and  there  will  be  live  data  in 
ephemeral  space  during  the  dynamic  garbage  collection. 


4.6  Dynamic  Garbage  Collection  in  the  Presence  of  Ephe¬ 
meral  Objects 


When  garbage-collecting  dynamic  space,  if  ephemeral  space  holds  live  ob¬ 
jects,  we  must  somehow  arrange  for  these  objects  to  have  their  references  to 
objects  in  dynamic  space  updated;  possibly  this  could  be  done  by  garbage¬ 
collecting  the  ephemeral  spaces  as  well.  Clearly  we  can  not  proceed  as  usual, 
and  simply  not  move  ephemeral  data  when  we  encountered  it  while  walking 
the  tree  from  the  roots;  the  Cheney  algorithm  uses  copying  and  reordering 
in  order  to  record  pending  branches,  in  the  same  way  that  other  algorithms 
use  a  stack,  or  pointer  reversal.  We  also  do  not  want  to  use  these  other 
algorithms;  the  use  of  a  stack  has  the  well-known  problem  of  deep  nesting 
causing  overflow,  and  Deutsch- Schorr- Waite  pointer  reversal  requires  visit¬ 
ing  twice  each  node  encountered. 

One  way  of  approaching  the  problem  is  to  subdivide  it  into  two  cases:  one 
may  either  leave  the  ephemeral  data  in  ephemeral  space,  or  move  them  all 
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into  dynamic  space. 


1.  The  obvious  way  to  proceed  if  we  wish  to  leave  the  data  in  ephemeral 
space  is  to  treat  ephemeral  space  as  roots  for  the  dynamic  garbage 
collection.  This  has  the  disadvantage  of  quite  possibly  causing  preser¬ 
vation  of  structures  that  are  only  pointed  to  by  garbage  in  ephemeral 
space,  but  we  expect  that  these  will  be  rare. 

The  algorithm  is  slightly  complicated  by  the  necessity  of  updating 
pointer  storage  recordiirg  structures.  We  do  this  by  first  clearing 
the  structures  recording  pointers  for  the  locations  in  dynamic  from- 
space.  As  scavenging  is  performed,  when  a.  pointer  is  stored  in  dynamic 
tospace,  if  it  points  into  ephemeral  space,  it  is  recorded  in  the  proper 
structure. 

2.  The  other  possibility,  that  of  moving  all  the  data  into  dynamic  space, 
causes  premature  tenuring,  to  use  Ungar’s  term,  but  has  the  advantage 
of  being  simpler.  We  simply  treat  data  in  ephemeral  space  in  the  same 
way  we  treat  data  in  oldspace;  we  copy  them  all  into  newspace.  At 
the  end  of  the  garbage  collection,  ephemeral  space  is  empty,  and  aU 
recording  structures  are  cleared. 


The  problem  with  scheme  2  is  that,  although  it  is  indeed  simpler  than  scheme 
1,  there  is  no  obvious  way  to  proceed  if  one  exceeds  the  capacity  of  tospace 
during  the  garbage  collection  and  virtual  memory  is  exhausted.  Exceeding 
the  capacity  of  tospace  is  indeed  possible,  as  the  entire  live  contents  of 
ephemeral  space  are  being  added  at  once  to  dynamic  space. 

If  we  instead  treat  the  ephemeral  spaces  as  roots,  we  may  perform  the  dy¬ 
namic  garbage  collection  without  advancing  ephemeral  data.  If  we  were  in 
the  midst  of  performing  an  ephemeral  garbage  collection,  and  the  dynamic 
garbage  collection  freed  enough  space  to  allow  the  advancement  of  ephe¬ 
meral  objects  into  dynamic  space,  we  would  simply  continue.  If  we  found 
that  an  amount  of  space  insufficient  to  allow  advancement  of  ephemeral  data 
was  freed,  we  might  disable  dynamic  garbage  collection,  copy  the  ephemeral 
data  into  the  unused  semispace,  and  signal  an  error  to  the  user,  who  could 
respond  by,  for  example,  suspending  the  LISP  process  until  more  virtual 
memory  is  available. 
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The  ability  to  continue  in  this  fashion  requires  that  the  sum  of  the  sizes  of 
the  ephemeral  levels  be  no  larger  than  a  dynamic  semispace. 

We  cannot  use  the  same  sequence  of  operations  with  scheme  2,  because  we 
cannot  perform  the  dynamic  garbage  collection  without  also  beginning  to 
copy  in  the  ephemeral  data,  and,  at  that  point,  we  no  longer  have  the  option 
of  using  the  other  semispace  if  we  should  run  out  of  space;  we  are  already 
using  both  semispaces. 

Thus  we  chose  to  use  scheme  1  in  our  garbage  collector.^® 


^°For  simplicity,  the  pointer  storage  recording  algorithms  given  in  sections  5  and  6  do  not 
show  the  cases  in  which  copying  into  dynamic  space  requires  dynamic  garbage  collection, 
and  thus  do  not  depict  the  necessity  of  examining  recording  structures  that  the  dynamic 
garbage  collector  may  have  updated  during  during  garbage  collection.  These  extensions 
are,  however,  straightforward. 
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5  Card-Marking 


Two  schemes  for  recording  pointer  stores  were  explored  at  length;  only  one 
of  these  was  implemented.  The  scheme  that  was  discarded  was  called  card¬ 
marking.  It  was  similar  to  the  Moon’s  ephemeral  garbage  collector,  as  one 
might  implement  it  without  virtual  memory  system  cooperation;  the  signif¬ 
icant  differences  had  mostly  to  do  with  a  desire  to  keep  the  pointer-setting 
time  low. 


5.1  Division  of  Memory;  Determination  of  the  Root  Set 

The  root  set  is  recorded  through  a  scheme  much  like  Moon’s.  Memory  is 
divided  at  a  fine  level  into  pieces  called  cards;  these  correspond  to  the  3600 ’s 
pages,  but  we  do  not  call  them  pages  in  order  to  avoid  confusion  with  virtual 
memory  pages  on  the  machine  in  question. 

There  are  two  tables  used  in  determining  the  root  set;  these  are  called  the 
primary  card  mark  table  and  the  secondary  card  mark  table.  The  primary 
card  mark  table  is  a  table  of  bits  directly  mapped  to  all  the  cards  in  the 
address  space.  There  is  one  bit  per  card;  the  bit  is  set  whenever  a  pointer  is 
stored  in  the  card.  Thus  the  bit  is  a  sort  of  dirty  bit  for  the  card;  it  indicates 
that  the  card  has  been  modified  and  must  be  scanned  to  determine  whether 
it  contains  a  reference  to  some  ephemeral  level. 

The  secondary  card  mark  table  is  implemented  sparsely  as  a  collection  of 
tables;  every  segment  that  can  contain  pointers  has  associated  with  it  a 
portion  of  the  secondary  card  mark  table.  The  secondary  card  mark  table 
contains  one  byte  per  card;  each  of  these  card  entries  contains  one  bit  for  each 
ephemeral  level  (thus  we  have  a  maximum  of  eight  ephemeral  levels).  An 
ephemeral  level’s  bit  is  set  to  indicate  that  the  card  contains  a  reference  to 
that  particular  ephemeral  level.  The  secondary  card  mark  table  is  updated 
at  garbage  collection  scan  time. 

The  size  of  cards  is  a  compromise  between  a  desire  to  keep  the  card  mark 
tables  small  (which  argues  for  a  large  card  size)  and  a  desire  to  minimize 
the  amount  of  time  spent  in  scanning  a  card  already  in  main  memory  for 
a.  possibly  nonexistent  reference  to  an  ephemeral  object  (which  argues  for  a 
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small  card  size).  256  words  is  probably  a  good  card  size  on  machines  with, 
say,  28  bits  of  address  space,  like  the  Sun-3;  this  would  give  a  primary  card 
mark  table  size  of  32  kilobytes,  and  the  per-segment  secondary  card  mark 
table  portions  would  be  64  bytes  each.  Machines  with  32  bits  of  address 
space  may  motivate  larger  card  sizes,  but  in  no  case  should  the  card  size 
be  larger  than  a  virtual  memory  page  frame,  because  of  the  much  increased 
likelihood  of  unnecessary  page  faults  at  scan  time. 

As  in  the  proposed  stock  hardware  implementation  of  Moon’s  ephemeral 
garbage  collector  (Section  2.3.2),  card-marking  requires  that  the  object  cre¬ 
ation  routines  be  modified  to  record  in  a  table  the  location  of  the  first  tagged 
word  on  each  card;  this  allows  scanning  of  cards  containing  untagged  data. 

Card-marking  allows  a  compact  representation  of  recorded  pointer  stores. 
What  are  the  dynamic  characteristics  of  this  technique;  i.e.,  what  does  it 
save  us  at  pointer  storage  time?  Assuming  we  have  256-word  cards  and  a  28- 
bit  address  space,  a  card’s  byte  in  the  primary  card  mark  table  is  determined 
by  the  high  15  bits  (13-27)  of  the  28-bit  address;  the  bit  in  question  is  given 
by  bits  10-12  of  the  address.  If  we  place  the  primary  card  mark  table  in 
a  register,  or  at  some  constant  offset  from  a  register,  we  can  mark  a  card 
in  six  instructions  on  the  MC68020,  with  two  temporary  data  registers;  the 
MC68020  code  for  this  is  shown  in  Figure  4. 

Here  is  the  advantage  of  card-marking,  then:  with  only  six  instructions 
required  to  make  an  entry  in  the  primary  card  mark  table,  pointer  storage 
can  probably  continue  to  be  coded  in-line,  so  that  the  expense  of  an  out- 
of-hne  subroutine  call  is  saved;  performance  on  pointer-storage-intensive 
benchmarks  that  do  not  discard  much  storage  is  fikely  to  be  quite  good. 


5.2  Performing  a  Garbage  Collection 

Garbage  collection  in  a  card-marking  scheme  is  similar  to,  although  simpler 
than.  Moon’s  ephemeral  garbage  collection;  first  in  being  stop-and-copy,  as 
opposed  to  incremental,  and  second  in  not  being  directly  concerned  with 
the  virtual  memory  system.  A  Pidgin  ALGOL  description  of  card-marking 
garbage  collection  is  given  in  Figures  5  and  6. 

Following  is  a  description  of  how  a  card-marking  garbage  collection  proceeds. 
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; ;  Assume  that  the  location  to  be  stored  in  is  in 
; ;  aO,  and  that  the  primary  card  mark  table  is  at  a 
; ;  constant  offset  crdtbl  from  the  systemic 
; ;  quantities  vector  (SQ) ,  held  in  an  address 
; ;  register . 

; ;  Store  the  reference  location  in  temporary  that 
;;  will  get  byte  pointer, 
move.l  aO,  dO 

;;  68K  can  only  immediate  shift  8  places;  we  need  10 
Isr.l  #8,  dO 

move.l  dO,  dl  ; other  temporary  for  bit  field 
Isr.l  #5,  dO  ;now  we  have  the  byte  index  in  dO 

Isr.l  #2,  dl  ;and  the  bit  index  in  dl 

; ;  note  bset  will  mask  all  but  the  low  three  bits  of 
; ;  the  bit  index 

bset.b  dl,  (crdtbl,  sq,  dO.w)  ;set  the  bit 


Figure  4:  Marking  a  card  on  the  MC68020. 

When  we  garbage-collect  an  ephemeral  level,  we  first  scavenge  the  stack  and 
mark  registers,  just  as  with  a  dynamic  GC.  Then  we  consult  the  secondary 
card  mark  table,  and  scan  all  cards  whose  secondary  marks  state  that  they 
contain  pointers  to  this  level,  except  those  within  the  level  itself;  these  need 
not  be  scanned,  because  they  cannot  contain  roots  for  this  level. 

The  scan  proceeds  as  follows:  each  word  in  the  card  is  fetched.  If  the  word 
is  a  pointer  to  the  ephemeral  level  being  garbage-collected,  it  is  treated 
as  a  root,  and  a  scavenge  is  performed  on  the  object  it  points  to;  but,  in 
any  case,  if  after  the  potential  scavenge  the  word  contains  a  pointer  to  any 
ephemeral  level  at  aU,  a  flag  for  that  level  is  set  (note  that  the  placement 
of  this  operation  after  the  scavenge  guarantees  that  the  appropriate  level’s 
flag  is  set  -  the  datum  has  changed  levels  in  the  scavenge).  When  the  scan 
of  the  card  is  finished,  the  secondary  card  mark  table  entries  for  this  card 
are  reset  from  these  flags,  and  the  primary  mark  for  this  card  is  cleared,  so 
that  we  can  avoid  scanning  the  card  twice. 

^^Note  A. 4  gives  a  comparison  with  the  Symbolics  approach. 
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procedure  card_marking_gc  (); 
begin 

push  marked  registers  on  stack; 
from_level  :=  0; 
done  :=  false; 
while  (not (done)) 
begin 

to_level  :=  from.level  +  1,  or  dynamic  space,  if 

from_level  is  the  last  ephemeral  level; 
scavenge_stack(f rom_level,  to_level) ; 
for  each  card  number  c 

if  card_ephemeral_level[c]  =  from_level 
then  begin 

for  each  level 

secondary_card_mark_table[c,  level]  := 
false ; 

primary_card_markCc]  :=  false 
end 
else 

if  secondary_card_mark_table[c,  from_level]  = 
true 

then  begin 

scan_and_scavenge_card( c ,  f rom_level , 

to_level) ; 

primary_card_mark[c]  :=  false 
end 

if  from_level  =  0 

then  for  each  card  number  c 

if  primary_card_mark[c]  =  true 
then  begin 

scan_and_scavenge_card(c,  from_level, 

to_level) ; 

primary_card_mark[c]  :=  false 

end 

if  to_level  is  not  full,  or  is  dynamic  space 
then  done  :=  true 

else  from_level  :=  from_level  +  1 

end 

end 


Figure  5;  The  card-marking  garbage  collection  algorithm. 
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procedure  scan_and_scavenge_card 

(card_number ,  from.level,  to_level) ; 

boolean  vector  level.f lags [number.of _ephemeral_levels] ; 
begin 

for  i  from  0  until  number_of _ephemeral_levels 
do  level_f lags [i]  :=  false; 

for  each  word  in  the  card,  beginning  at  the  first  tagged 
word 
begin 

if  the  word  is  an  immediate  constant 
then  continue  at  the  next  word; 
if  the  word  is  the  header  of  a  vector  of  untagged 
data 

then  continue  at  the  first  word  after  the  vector; 
if  the  word  points  into  from_level 

then  scavenge_word(word,  from_level,  to_level) ; 
if  word  points  to  an  ephemeral  level 
then  begin 

1  :=  the  level  pointed  to; 
level_f lags [i]  :=  true 
end 

end 

for  i  from  0  until  number_of _ephemeral_levels 
do  secondary_card_mark_table  [card_number ,  i]  :  = 
level_f lags [i] 

end 


Figure  6:  Scanning  and  scavenging  a  card. 
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When  we  have  finished  scanning  all  the  cards  whose  secondary  marks  state 
that  they  contain  references  to  the  ephemeral  level  being  garbage-collected, 
we  scan  all  the  cards  whose  primary  marks  are  still  set,  except  those  within 
the  ephemeral  level  being  garbage-collected.^^  The  scan  is  exactly  like  that 
performed  for  cards  found  through  the  secondary  card  mark  table;  note  that 
this  means  that  secondary  card  marks  are  updated  and  primary  card  marks 
are  cleared. 

When  we  have  finished  scanning  all  the  cards  whose  primary  marks  were 
set,  we  have  scavenged  all  the  data  in  the  ephemeral  level  that  was  being 
garbage-collected,  and  we  can  clear  its  primary  and  secondary  card  mark 
tables.  Note  that  this  means  that  the  entire  primary  card  mark  table  is  now 
clear. 


5.3  The  Problem  with  Card-Marking 

The  inner  loop  of  the  piece  of  code  that  scans  a  card  on  the  MC68020  was 
presented  in  Figure  1,  on  page  13.  The  time  required  to  scan  a  256- word  card 
on  the  MC68020  was  estimated  at  816  microseconds.  The  gain  in  pointer- 
storage  speed  afforded  by  card-marking  is  substantial,  but  it  was  estimated 
at  less  than  the  loss  due  to  card-scanning.^^ 

Furthermore,  the  gain  in  pointer-storage  speed  was  likely  to  be  lost  on  ma¬ 
chines  with  thirty-two  bit  address  spaces,  especially  those  in  which  the  LISP 
address  space  was  to  be  organized  sparsely.  Here  a  contiguous  primary  card 
mark  table  would  be  prohibitively  large,  and  so  a  two-level  map  would  be 
necessary;  but  this  would  be  nearly  as  expensive  at  pointer-storage  time 
as  the  scheme  we  actually  implemented,  which  we  shall  call,  for  want  of  a 
better  name,  word-marking. 


^^Note  A. 4  discusses  the  virtual  memory  behavior  that  will  result  from  this  scanning 
order. 

^®This  was  only  an  estimate,  however,  and  we  have  come  to  wonder  whether  it  was  cor¬ 
rect.  A  prototype  card-marking  implementation  would  certainly  answer  these  questions; 
see  Section  8,  however,  for  other  possibilities. 
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6  Word- Marking 

6.1  Recording  the  Root  Set 

The  scheme  used  to  record  the  root  set  in  the  Lucid  Common  LISP  Ephe¬ 
meral  Garbage  Collector  is  called  word-marking.  Word-marking  uses  two 
different  data  structures  to  record  the  root  set.  The  first  is  a  table  of  modi¬ 
fication  bits;  the  second  is  a  set  of  explicitly-managed  lists. 


6.1.1  Modification  Bit  Tables 

We  divide  the  address  space  into  large  pieces  called  segments;  on  the  MC68020, 
these  are  64  kilobytes  in  length.  Their  exact  size  is  not  critical;  making  them 
64  kilobytes  in  length  allows  a  simple  instruction  sequence  to  extract  a  seg¬ 
ment  number  from  a  pointer. 

There  is  for  each  allocated  segment  a  table  of  bits,  called  a  modification  bit 
table  (MBT).  The  MBT  contains  one  bit  for  each  longword  in  the  segment; 
thus,  on  the  MC68020,  MBTs  will  be  2  kilobytes  in  length.  Every  segment 
has  associated  with  it  an  MBT,  but  the  MBTs  are  sparsely  allocated,  in  that 
there  will  be  a  single  MBT  shared  by  all  the  segments  for  which  we  do  not 
need  to  record  pointers  into  ephemeral  space;  these  include  the  segments 
in  the  youngest  ephemeral  level,  the  unscanned  segments,  and  segments 
holding  non-pointer  data.  This  MBT  is  called  the  non-recording  MBT,  and 
is  specially  recognized  by  the  garbage  collector. 

The  MBTs  reside  in  static  space,  and  are  explicitly  managed  by  the  memory 
manager.  They  are  allocated  in  groups,  and  stored  contiguously,  for  slightly 
better  locality  on  systems  with  page  frames  larger  than  2  kilobytes. 

The  MBT  stores  the  same  sort  of  information  that  the  primary  card  mark 
table  was  to  hold  in  the  card-marking  scheme,  but  the  information  is  stored 
on  a  per- word  basis.  That  is:  in  card-marking,  when  one  modifies  a  location, 
the  bit  for  the  card  in  which  the  location  resides  is  set.  In  word-marking, 
when  one  modifies  a  location,  the  MBT  for  the  segment  in  which  the  location 
resides  is  fetched,  and  the  bit  within  it  corresponding  to  the  location  is  set. 
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So  that  the  garbage  collector  need  not  examine  the  MBT  for  each  segment 
that  might  have  been  modified,  there  is  a  table,  called  the  segment  mod¬ 
ification  cache,  which  contains  one  byte  for  each  segment;  the  byte  for  a 
segment  is  set  nonzero  whenever  an  entry  is  made  in  its  modification  table. 
A  bvte  is  used  for  each  segment  because  the  table  must  be  modified  quickly 
when  a  pointer  is  stored. 

The  segment  modification  cache  must  also  be  read  quickly  at  garbage  collec¬ 
tion  time.  On  a  machine  with  a  ‘28-bit  address  space,  the  segment  modifica¬ 
tion  cache  is  4  kilobytes  in  length. With  a  longword  test,  the  entries  for  4 
segments  can  be  checked  at  once.  A  table  of  bits  would  have  allowed  quicker 
examination,  but  would  make  pointer-setting  slower  by  several  instructions. 

Note  that  the  segment  modification  cache  cannot  practically  be  spread 
among  the  modification  bit  tables,  through  some  technique  where  a  des¬ 
ignated  location  in  the  MBT  held  a  value  indicating  whether  the  MBT  had 
been  modified  since  the  last  garbage  collection.  This  would  allow  us  to  save 
an  instruction  at  pointer  storage  time,  but  would  degrade  virtual  memory 
performance  at  garbage  collection  time,  as  every  allocated  MBT  would  have 
to  be  examined.  The  MBTs  may  occupy  some  3%  of  allocated  storage; 
examining  each  one  could  significantly  increase  virtual  memory  traffic. 

Updating  both  the  modification  bit  and  the  segment  modification  cache  re¬ 
quires  some  ten  instructions,^^  a  temporary  address  register,  and  two  tem¬ 
porary  data  registers  on  the  MC68020;  the  code  is  shown  in  Figure  7. 

The  length  of  this  instruction  sequence  is  sufficiently  great  that  it  must  be 
coded  as  an  out-of-line  call,  or  significantly  increase  the  amount  of  code  in 
the  LISP  image. 

course,  in  most  applications,  only  a  fraction  of  the  address  space  of  the  processor 
is  allocated;  the  table  is  only  searched  as  far  as  the  last  allocated  segment. 

^®By  using  the  MC68020’s  memory  indirect  post-indexed  addressing  mode,  we  can 
shorten  this  to  nine  instructions,  and  this  would  be  faster  on  the  MC68030;  however, 
it  will  be  considerably  slower  in  most  cases  on  the  MC68020. 
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; ;  SQ  is  an  address  register  holding  the  base 
;;  address  of  a  vector  of  systemic  quantities. 

; ;  SMCACHE  is  the  constant  offset  from  the  base  of 

; ;  the  SQ  vector  to  the  base  of  the  segment 

;;  modification  cache.  MBTTBLS  is  the  constant 

; ;  offset  from  the  base  of  the  SQ  vector  to  the  slot 

; ;  holding  a  pointer  to  the  base  of  the 

;;  segment -number  indexed  table  of  MBTs. 

move.l  aO,  dO  ; location  being  modified  is  in  aO 

move.l  aO,  dl  ;d0  and  dl  are  temporaries 

swap  dl  ;low  16  bits  of  dl  now  hold  segment 

; ;  the  segment  modification  cache  lies  directly 

;;  after  the  SQ  vector.  Set  location  to  indicate 

;;  segment  modified. 

move.b  #-l,  (smcach,  SQ,  dl.w) 

; ;  For  compactness,  the  next  two  instructions 
; ;  may  be  replaced  by  the  single  instruction 
;;  move.l  ([mbttbls,  SQ] ,  dl.w*4,  0),  al, 

;;  but  this  would  be  slower  on  the  68020. 

; ;  Get  address  of  table  of  modification  tables  in  al 
move.l  (mbttbls,  SQ) ,  al 
;;  get  address  of  this  segment’s  MBT  in  al 
move.l  (0,  al,  dl.w*4),  al 

; ;  get  bit  field  in  dO  (low  3  are  bit  to  set) 

Isr.w  #2,  dO 

move.l  dO,  dl  ;dl  will  be  byte  in  table 
; ;  make  11  bits  in  low  half  be  byte  in  table 
Isr.w  #3,  dl 

bset.b  dO,  0(al,d2.w)  ;set  the  bit 

Figure  7:  MC68020  code  to  update  an  MBT  and  the  segment  modification 

cache  in  a  word-marking  scheme. 
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6.1.2  Entry  Backpointer  Lists 


The  modification  bit  tables  hold  very  little  information;  we  must  examine 
the  locations  that  were  modified  to  determine  whether  they  contain  pointers 
to  an  ephemeral  level.  We  do  this  at  garbage  collection  time.  The  garbage 
collector  examines  modified  locations  and  adds  the  addresses  of  locations 
that  point  into  an  ephemeral  level  to  that  ephemeral  level’s  entry  back- 
pointer  list,  or  EBPL.  The  EBPLs  are  explicitly-managed  lists  maintained 
in  a  block  of  static  space  by  the  garbage  collector. 

The  EBPLs  are  managed  in  such  a  fashion  that  their  entries  are  unique; 
there  is  no  duplication  of  entries.  They  are  lists,  rather  than  queues,  because 
those  for  different  levels  grow  and  shrink  dynamically.  They  are  grown  only 
when  the  youngest  ephemeral  level  is  garbage-collected  and  modification 
bit  tables  are  examined.  They  will  also  shrink  at  this  time;  a  modification 
may  mean  that  a  pointer  to  an  ephemeral  level  was  replaced  by  a  pointer  to 
some  other  ephemeral  level,  or  to  a  location  outside  of  ephemeral  space.  The 
management  of  the  EBPLs  is  explicit:  when  an  entry  is  removed,  its  cons 
cell  is  returned  to  a  freelist.  When  the  oldest  ephemeral  level  is  garbage- 
collected  into  dynamic  space,  its  entire  EBPL  is  linked  into  the  freelist. 

Because  the  EBPLs  are  updated  at  garbage  collection  time,  if  a  program 
does  not  create  objects,  it  will  not  pause.  Also  because  the  EBPLs  are 
updated  at  garbage  collection  time,  it  is  possible  in  a  linear  pass  through 
them  to  maintain  unique  entries;  such  a  linear  pass  would  not  be  possible 
at  pointer-setting  time. 

Note  that  the  use  of  EBPLs  for  level-specific  information  means  that  word¬ 
marking  imposes  no  limitation  on  the  number  of  ephemeral  levels  allowed. 


6.2  Performing  a  Garbage  Collection 

A  Pidgin  ALGOL  routine  that  performs  a.  word-marking  garbage  collection 
is  shown  in  Figures  8  and  9.  An  explanation  of  its  working  follows. 

Initially,  we  scan  the  stack  and  ma.rk  registers,  just  as  with  a  dynamic  GC. 
Then  we  make  a  pass  over  the  EBPLs.  Any  EBPL  entry  corresponding 
to  a  location  whose  MBT  entry  is  set  is  elided;  the  scavenge  from  MBTs 
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procedure  ¥ord_marking_gc  () ; 
begin 

push  marked  registers  on  stack; 
from_level  ;=  0; 
done  :=  false; 
mbts_not_empty  :=  true; 
f ailed_to_clear_iabt  :=  false; 
while  (not (done))  begin 

to_level  :=  from_level  +  1,  or  dynamic  space,  if 

from.level  is  the  last  ephemeral  level; 
scavenge_stack(from_level,  to_level) ; 
for  each  location  in  each  EBPL 
if  mbt_entry_set (location) 

then  remove  the  location  from  the  EBPL; 
for  each  location  in  the  EBPL  for  from_level  begin 
if  location  is  in  from_level  or  to_level 
then  remove  the  location  from  the  EBPL; 
if  location  is  not  in  from_level 

then  scavenge_word(location,  from.level, 

to_level) 

end 

if  mbts_not_empty  then  begin 

for  each  segment  whose  segment  modification 
cache  entry  is  set 

if  segment_mbt(segment)  =  non_recording_mbt 
then  clear_segment_mod_cache_ entry (segment) 
else  if  scavenge_segment_from_mbt 

(segment,  from_level,  to_level) ; 
then  clear_segment_raod_cache_entry (segment) 
else  f ailed_to_clear_mbt  :=  true; 
rabts_not_empty  :=  f ailed_to_clear_mbt 
end 

if  to_level  is  dynamic  space 

then  link  frora_level’s  EBPL  to  the  EBPL  freelist 
else  link  from_level’s  EBPL  to  to_level’s  EBPL; 
set  from_level’s  EBPL  to  nil; 
if  to_level  is  not  full,  or  is  dynamic  space 
then  done  :=  true 
else  from_level  :=  from_level  +  1 

end 

end 


Figure  8:  The  word-marking  garbage  collection  algorithm. 
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procedure  scavenge_segment_from_mbt(seginent ,  froin_level, 

to_level) ; 


begin 

mbt  :=  segment_mbt (segment) ; 
entries_not_cleared  :=  false; 

for  each  location  corresponding  to  a  set  entry  in  mbt 
if  the  location  contains  immediate  constant  data,  or 
does  not  point  into  an  ephemeral  level 
then  clear _mbt_entry(location) 
else  begin 

level  :=  the  ephemeral  level  into  which  location 
points ; 

if  level  =  from.level 

then  scavenge_word(location,  from_level, 

to_level) ; 

if  location  itself  is  in  to_level 
then  clear_entry  :=  true 

comment  add_to_ebpl  returns  true  if  successful 
else  clear_entry  :=  add_to_ebpl (level ,  location); 
if  clear. entry 

then  clear_mbt_entry(location) 
else  entries_not_cleared  :=  true; 

end 

return(not (entries_not_cleared) ) ; 
end 


Figure  9:  Scavenging  the  words  in  a  segment  that  point  into  ephemeral  space 
by  examining  its  modification  bit  table. 
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performed  later  will  examine  the  contents  of  the  location  and  determine 
which  EBPL  it  should  now  be  placed  on,  if  any.  This  operation  guarantees 
uniqueness  of  entries  on  EBPLs,  and  also  guarantees  that  for  the  rest  of  the 
garbage  collection,  the  locations  listed  in  EBPLs  will  be  known  to  contain 
pointers  into  the  corresponding  ephemeral  level. 

Then  we  fetch  the  EBPL  for  this  level. For  each  location  listed  in  the 
EBPL,  we  determine  whether  the  location  is  in  the  space  being  garbage- 
collected  or  the  space  being  garbage-collected  into;  in  either  case  we  reclaim 
the  EBPL  cons.  We  can  do  so  because  pointers  stored  within  an  ephemeral 
level  are  not  considered  part  of  the  root  set  for  that  ephemeral  level;  thus 
they  should  not  be  on  the  level’s  EBPL. 

If  the  location  is  not  in  fromspace,^'  we  scavenge  it. 

Now  we  scan  the  segment  modification  cache  entries  for  the  segments  that 
may  contain  pointers  to  this  ephemeral  level.  When  we  find  a  segment  that 
has  been  modified,  we  fetch  and  examine  its  MBT.  If  the  MBT  is  the  unique 
non-recording  MBT,  we  need  not  examine  it  further,  as  it  is  associated  only 
with  segments  that  cannot  contain  ephemeral  references.  Otherwise,  the 
MBT  is  examined  a  word  at  a  time  for  nonzero  entries;  because  this  is 
simply  a  check  for  nonzero  entries,  it  is  a  two-instruction  dbne  loop  on 
the  MC68020,  performed  for  512  words.  The  scan  could  be  optimized  by 
recording  the  least  and  greatest  locations  modified  in  the  segment,  but  this 
would  make  pointer  storage  still  slower. 

When  a  modified  location  is  found,  its  contents  are  examined;  if  these  do 
not  point  into  an  ephemeral  level  or  are  constant  immediate  data,  the  MBT 
entry  for  the  location  is  cleared,  and  the  search  continues  at  the  next  word. 
If  the  location  contains  a  pointer  to  an  ephemeral  object,  then,  if  the  object 
is  in  the  level  being  garb  age- collected,  that  is,  fromspace,  the  location  is 

^®Note  that,  if  this  is  a  garbage  collection  of  the  youngest  ephemeral  level,  the  EBPL 
will  be  empty,  because  the  modification  table  is  scanned  only  when  a  garbage  collection 
happens  (but  see  note  A. 5). 

^'This  check  is  redundant  if  the  algorithm  is  implemented  exactly  as  shown,  as  we  re¬ 
claim  EBPL  entries  for  locations  in  tospace  before  linking  fromspace’s  EBPL  into  tospace’s 
EBPL,  to  reduce  the  chance  of  running  out  of  EBPL  conses.  If  the  obvious  simplification 
is  made,  however,  this  step  would  be  necessary  to  allow  the  reclamation  of  EBPL  conses 
and  ephemeral  objects  pointed  to  only  by  dead  ephemeral  objects  at  later  levels;  for  ex¬ 
ample,  the  ephemeral  garbage  collector  could  not  otherwise  reclaim  circular  structures 
spread  across  more  than  one  ephemeral  level. 
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scavenged.  If  the  location  itself  was  in  tospace,  we  simply  clear  its  MBT 
entry;  this  guarantees  that  MBTs  for  empty  levels  are  cleared.  Otherwise 
we  attempt  to  add  it  to  the  appropriate  EBPL.  If  we  were  not  out  of  EBPL 
conses  and  thus  succeeded  in  adding  the  location  to  the  appropriate  EBPL, 
we  may  clear  the  location’s  MBT  entry,  but  otherwise  we  must  leave  it  set, 
so  that  we  examine  the  location  at  the  next  garbage-collection,  as  it  is  known 
to  contain  an  ephemeral  reference.  This  state,  in  which  the  EBPL  freelist  is 
empty,  will  not  persist,  because  eventually  a  garbage  collection  of  the  oldest 
ephemeral  level  will  happen;  when  it  completes,  all  EBPLs  will  again  be 
null. 

When  we  have  finished  scavenging  ephemeral  references  recorded  in  the  mod¬ 
ification  tables,  we  have  copied  all  the  live  data  out  of  the  ephemeral  level 
being  garbage-collected.  If  we  succeeded  in  clearing  all  MBTs,  we  note  the 
fact,  so  that  we  need  not  examine  the  segment  modification  cache  on  the 
next  garbage  collection.  Note  that  we  have  necessarily  cleared  the  segment 
modification  cache  entries  and  MBTs  for  the  segments  in  fromspace;  we  have 
also  elided  from  the  EBPL  for  fromspace  all  entries  whose  locations  were  in 
tospace.  We  must  now  update  the  EBPL  for  tospace  to  include  entries  for 
locations  that  point  to  the  data  just  copied  into  it;  this  we  do  by  linking  to 
its  end  the  EBPL  for  fromspace,  and  setting  the  EBPL  for  fromspace  to  nil. 
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7  Performance  Measurements  and  Analysis 


We  describe  two  sets  of  performance  measurements.  The  first  was  collected 
in  January  of  1988,  on  a  preliminary  release  of  the  ephemeral  garbage  col¬ 
lector  running  on  various  Sun  workstations,  and  measures  performance  on 
several  of  the  Gabriel  benchmarks  [7].  The  second  set  of  measurements  mea¬ 
sures  performance  of  a  prototype  version  of  the  system,  running  the  Lucid 
Common  LISP  production  compiler  on  Apollo  workstations;  these  measure¬ 
ments  were  collected  in  the  summer  of  1987. 


7.1  Performance  on  the  Gabriel  Benchmarks 

The  Sun  benchmarks  are  summarized  in  Tables  1,  2,  and  3.  Timings  were 
measured  on  single-user  machines  with  network  paging  over  a  10-megabaud 
Ethernet  to  a  Sun-3/ 180  file  server;  the  paging  devices  on  the  file  server 
were  fast  disks  operating  through  SMD  interfaces.  There  was  essentially  no 
network  contention  when  the  timings  data,  were  collected. 

All  three  machines  used  in  benchmark  timings  used  MC68020  processors 
clocked  at  16  megahertz.  The  Sun-3/75  was  configured  with  4  megabytes 
of  physical  memory;  the  Sun-3/110,  with  8  megabytes  of  physical  memory, 
and  the  Sun-3/ 180  with  16  megabytes  of  physical  memory. 

The  benchmarks  were  run  with  the  operating  system  in  single-user  mode  to 
avoid  any  anomalies  from  running  daemons.  They  were  compiled  with  the 
Lucid  Common  LISP/Sun  3.0  production  compiler,  with  speed  and  safety 
settings  of  3  and  0,  respectively.  The  version  of  LISP  used  was  a  beta-test 
version  of  Lucid  Common  LISP/Sun  Release  3.0.  It  was  configured  with 
82-segment  dynamic  semispaces  (5.4  megabytes  each),  and  three  ephemeral 
levels;  level  0  was  8  segments  (512  kilobytes)  in  length,  and  levels  1  and 
2  were  each  10  segments  (640  kilobytes)  in  length.  In  each  case  the  best 
timing  from  several  runs  is  given.  These  benchmarks  are  described  in  detail 
in  [7]. 

In  most  cases,  ephemeral  garbage  collection  reduced  the  elapsed  real  time 
for  execution  of  these  benchmarks;  this  is  especially  so  in  cases  where  several 
dynamic  garbage  collections  had  to  be  performed.  The  difference  is  most 
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(dotimes  (i  10)  (boyer-setup)  (boyer-test) ) 

Processor 

Parameter 

Measured 

Garbage  Collector 

Dynamic  only 

Ephemeral 

Sun-3/75 

16mhz  MC68020 

4mb  main  memory 

Elapsed  Real  Time 

235.5 

206.7 

CPU  Time 

162.7 

190.46 

Dynamic  Bytes 
Consed 

18,139,280 

1,665,216 

Dynamic  Garbage 
Collections 

3 

0 

Sun-3/110 

16mhz  MC68020 

8mb  main  memory 

Elapsed  Real  Time 

244.2 

176.5 

CPU  Time 

166.4 

174.4 

Dynamic  Bytes 
Consed 

18,139,280 

1,665,216 

Dynamic  Garbage 
Collections 

3 

0 

Sun-3/180 

16mhz  MC68020 
16mb  main  memory 

148.6 

171.9 

CPU  Time 

171.9 

18,139,576 

1,533,040 

4 

0 

Table  1:  BOYER  benchmark  timings.  Times  are  in  seconds. 
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(dotimes  (i  20)  (deriv-run) ) 

Processor 

Parameter 

Measured 

Garbage  Collector 

Dynamic  only 

Ephemeral 

Sun-3/75 

16mhz  MC68020 

4mb  main  memory 

Elapsed  Real  Time 

389.0 

98.4 

CPU  Time 

172.9 

89.0 

Dynamic  Bytes 
Consed 

39,205,320 

0 

Dynamic  Garbage 
Collections 

7 

0 

Sun-3/110 

16mhz  MC68020 

8mb  main  memory 

Elapsed  Real  Time 

387.8 

89.8 

CPU  Time 

89.2 

Dynamic  Bytes 
Consed 

39,205,320 

0 

Dynamic  Garbage 
Collections 

7 

0 

Sun-3/180 

16mhz  MC68020 
16nib  main  memory 

Elapsed  Real  Time 

KBB 

89.3 

CPU  Time 

mssm^rn 

89.3 

Dynamic  Bytes 
Consed 

39,205,320 

0 

Dynamic  Garbage 
Collections 

7 

0 

Table  2:  DERIV  benchmark  timings.  Times  are  in  seconds. 
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(dotimes  (i  100)  (destructive  600  50)) 

Processor 

Parameter 

Measured 

Garbage  Collector 

Dynamic  only 

Ephemeral 

Sun-3/75 

16mhz  MC68020 

4mb  main  memory 

Elapsed  Real  Time 

415.8 

196.3 

CPU  Time 

249.0 

178.1 

Dynamic  Bytes 
Consed 

34,489,080 

0 

Dynamic  Garbage 
Collections 

6 

0 

Sun-3/ no 

16mhz  MC68020 

8mb  main  memory 

Elapsed  Real  Time 

437.2 

CPU  Time 

259.6 

Dynamic  Bytes 
Consed 

34,489,080 

0 

Dynamic  Garbage 
Collections 

6 

0 

Sun-3/180 

16mhz  MC68020 
16mb  main  memory 

Elapsed  Real  Time 

194.5 

177.5 

CPU  Time 

191.5 

177.5 

Dynamic  Bytes 
Consed 

34,489,080 

0 

Dynamic  Garbage 
Collections 

6 

0 

Table  3:  DESTRUCTIVE  benchmark  timings.  Times  are  in  seconds. 
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dramatic  on  the  machines  configured  with  less  memory;  this  is  because  ephe 
meral  garbage  collection  drastically  reduces  the  size  of  the  working  set.  Note 
that  the  differences  between  central  processing  unit  time  and  real  time  on 
these  machines  is  large  under  dynamic  garbage  collection,  and  much  smaller 
under  ephemeral  garbage  collection;  as  the  machines  were  running  in  single- 
user  mode,  the  discrepancy  between  real  and  central  processing  unit  times 
will  be  due  almost  totally  to  virtual  memory  system  overhead. 

It  is  interesting  to  note  that,  in  some  cases,  ephemeral  garbage  collection 
reduced  the  amount  of  central  processing  unit  time  required  for  the  execution 
of  a  benchmark.  We  expect  that  the  reduced  size  of  the  root  set  accounts  for 
much  of  the  performance  improvement,  as,  among  these  benchmarks,  only 
BOYER  retains  for  long  the  large  structures  created.  Thus  it  seems  unlikely 
that  much  transporting  occurred  in  dynamic  garbage  collection. 

The  DESTRUCTIVE  benchmark  timings  (Table  3)  show  particularly  good 
performance  under  ephemeral  garbage  collection.  Reference  to  the  source 
code  reveals  one  reason;  only  two  of  the  six  destructive  operations  used  will 
result  in  invocation  of  the  out-of-line  subroutine  that  records  pointer  stores. 
The  others  are  stores  either  of  declared  fixnums  or  constant  symbols;  as 
noted  in  Section  4.1,  the  compiler  can  optimize  out  pointer-recording  in 
these  cases. 

In  the  BOYER  timings  (Table  1),  we  see  a  pattern  characteristic  of  the 
Lucid  ephemeral  garbage  collector:  enhanced  virtual  memory  performance 
is  gained  at  the  expense  of  increased  central  processing  unit  load.  The 
size  of  the  working  set  has  been  decreased;  the  improved  virtual  memory 
performance  has  resulted  in  reduced  elapsed  times  to  perform  a  task,  but 
at  a  cost  of  more  work  for  the  central  processing  unit.  On  machines  with 
better  virtual  memory  performauce,  use  of  the  ephemeral  garbage  collector 
is  less  attractive. 


7.2  Performance  of  the  Compiler  Under  Ephemeral  Garbage 
Collection 

The  Apollo  performance  measurements  are  summarized  in  Table  4.  These 
measurements  were  taken  on  single-user  machines  with  local  paging  and  file 
disks;  network  traffic  was  virtually  nil.  The  Apollo  DN4000  processor  is  an 
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Processor 

Parameter 

Garbage  Collector 

Measured 

Dynamic  only 

Ephemeral 

DN570-T 

Elapsed  Real  Time 

32,171.5 

28,554.7 

20mhz  MC68020 

CPU  Time 

16,429.0 

20,275.1 

8mb  main  memory 

Process  Disk  I/O 

499,881 

285,034 

154mb  disk,  ST506 

Dynamic  Bytes 
Consed 

722,183,608 

87,741,088 

DN3000 

Elapsed  Real  Time 

30,269.8 

31,535.6 

12mhz  MC68020 

CPU  Time 

19,768.3 

26,316.8 

8mb  main  memory 

Process  Disk  I/O 

485,096 

256,155 

348mb  disk,  ESDI 

Dynamic  Bytes 
Consed 

722,163,960 

90,460,272 

DN4000 

Elapsed  Real  Time 

12,-358.1 

15,693.7 

25mhz  MC68020 

CPU  Time 

11,680.3 

15,012.1 

32mb  main  memory 

Process  Disk  I/O 

8,226 

6,484 

348mb  disk,  ESDI 

Dynamic  Bytes 
Consed 

722,121,048 

92,014,816 

Table  4:  Global  recompilation  performance  measurements  on  Apollo  work¬ 
stations.  Times  are  in  seconds. 

MC68020  clocked  at  25  megahertz;  the  DN4000  in  question  was  configured 
with  32  megabytes  of  physical  memory  and  a  fast  348  megabyte  disk  drive 
operating  through  an  ESDI  interface.  The  DN3000  processor  is  an  MC68020 
clocked  at  12  megahertz;  the  DN3000  used  was  configured  with  8  megabytes 
of  physical  memory  and  a  fast  348  megabyte  disk  drive  with  the  same  ESDI 
interface  as  on  the  DN4000.  Finally,  the  DN570-T  processor  is  an  MC68020 
clocked  at  20  megahertz;  the  DN570-T  used  for  performance  measurements 
was  configured  with  8  megabytes  of  physical  memory  and  a  relatively  slow 
154  megabyte  disk  drive  operating  through  an  ST506  interface. 

The  task  executed  was  a  recompilation  of  all  the  files  in  the  LISP  system; 

it  also  served  as  a  testbed  for  debugging  the  ephemeral  garbage  collector 
1  s 

prototype. 

its  default  configuration,  the  Apollo  version  of  the  Lucid  Common  LISP  compiler 
makes  use  of  a  facility  that  allows  block  allocation  and  deallocation  of  temporary  storage. 
In  the  results  shown  here,  this  facility  has  been  disabled,  as  such  block  allocation  and 
deallocation  is  not  available  to  user  programs,  and  our  intention  is  to  provide  an  analysis 
of  the  behavior  under  ephemeral  garbage  collection  of  large  programs  utilizing  the  usual 
storage  allocation  facilities. 
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Use  of  the  ephemeral  garbage  collector  degraded  performance  in  the  global 
recompilation  task  on  both  the  DN4000  and  the  DN3000.  On  the  DN4000, 
the  task  took  27%  more  elapsed  real  time  under  ephemeral  garbage  collection 
than  under  dynamic  garbage  collection;  on  the  DN3000,  the  figure  was  about 
4.2%.  On  the  DN570-T,  however,  the  elapsed  real  time  under  ephemeral 
garbage  collection  is  11.2%  less  than  under  dynamic  garbage  collection. 

These  results  confirm  the  conclusion  we  reached  in  examining  our  timings 
on  Sun  workstations:  the  Lucid  Ephemeral  Garbage  Collector  improves  vir¬ 
tual  memory  performance  at  the  expense  of  central  processing  unit  time. 
Examination  of  the  “Disk  I/O”  figures  show  a  reduction  in  virtual  memory 
traffic  on  all  three  systems;  on  the  DN3000  and  DN570-T  this  reduction 
was  in  excess  of  42%  of  that  observed  under  dynamic  garbage  collection;  on 
the  DN4000,  configured  with  4  times  the  amount  of  physical  memory,  the 
reduction  in  disk  I/O  was  only  21%.  The  DN3000  and  DN570-T  disk  I/O 
figures  are  very  close,  as  would  be  expected  from  machines  with  identical 
amounts  of  physical  memory;  however,  the  DN570-T’s  faster  processor  gives 
it  a  lower  elapsed  time  under  ephemeral  garbage  collection.  Under  dynamic 
garbage  collection,  this  advantage  is  reversed  by  the  DN3000’s  faster  paging 
device. 

On  the  whole,  however,  these  measurements  show  much  worse  performance 
under  ephemeral  garbage  collection  while  performing  a  global  recompilation 
on  Apollo  workstations  than  while  running  benchmarks  on  Sun  worksta¬ 
tions.  Leaving  aside  momentarily  the  fact  that  the  tasks  being  performed 
are  different,  we  would  still  expect  some  discrepancy  in  performance,  due  to 
the  different  virtual  memory  characteristics  of  the  systems  being  compared. 

The  Sun-3  has  8-kilobyte  page  frames,  as  compared  to  the  Apollo’s  1-kilobyte 
page  frames;  the  coarser  page  size  hurts  performance  in  a  pointer-oriented 
language  with  heap  allocation,  hke  LISP.  Furthermore,  the  Apollos  whose 
performance  we  measured  had  more  memory  and  faster  paging  devices  than 
did  the  Suns.  But  we  also  see  wide  discrepancies  in  factors  besides  virtual 
memory  behavior;  in  particular,  the  central  processing  unit  time  expense 
of  ephemeral  garbage  collection  is  far  greater  in  the  Apollo  performance 
measurements. 

Of  course,  we  are  comparing  apples  and  oranges;  the  tasks  being  performed 
were  different.  What  is  interesting  is  how  they  are  different.  The  compiler 
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makes  heavy  use  of  hash  tables,  especially  when  reading  input  files,  and 
hash  tables  are  invalidated  by  copying  garbage  collection,  as  the  hash  values 
of  objects  stored  in  them  are  computed  from  their  addresses,  which  are 
assumed  to  have  changed.  Thus  references  to  hash  tables  between  garbage 
collections  require  recomputing  hash  values  for  all  objects  in  the  tables. 
Because  ephemeral  garbage  collections  occur  so  much  more  often  than  do 
dynamic  garbage  collections,  this  task  will  have  to  be  performed  many  times 
more  often  under  ephemeral  garbage  collection;  we  believe  that  this  accounts 
for  a  lot  of  the  central  processing  unit  time. 

Other  measurements  have  led  us  to  believe  that  the  default  configuration 
of  ephemeral  spaces  is  less  than  ideal  for  use  of  the  compiler.  The  compiler 
creates  large  data  structures  while  compiling  a  file,  and  retains  some  of 
them  through  the  entire  compilation  of  the  file;  thus  there  is  the  possibility 
that  these  large  structures  will  be  moved  several  times  by  ephemeral  garbage 
collection,  and  finally  advanced  into  dynamic  space,  where  they  are  released. 
We  have  in  fact  observed  this  behavior  by  metering  compilation. 

This  is  not  a  problem  pecubar  to  the  compiler;  we  expect  that  many  pro¬ 
grams  that  build  large  temporary  data  structures  will  exhibit  similar  be¬ 
havior.  Note  that,  because  these  structures  are  built  out  of  small  parts, 
the  automatic  allocation  of  very  large  objects  in  dynamic  space  (see  Sec¬ 
tion  4.5)  is  of  no  help  here.  This  is  called  the  pig-in-the-snake  problem.^® 
In  general,  it  can  be  solved  only  by  tuning  the  number  and  the  sizes  of 
ephemeral  levels  for  optimal  behavior  on  the  problem  at  hand.  In  the  case 
of  the  compilation  benchmark,  we  can  see  that  a  greater  delay  between 
garbage  collection  times,  as  occurs  in  dynamic  garbage  collection  (because 
semispaces  are  larger),  would  result  in  moving  these  structures  less  often  or 
possibly  not  at  all;  they  might  perish  first.  We  expect  that  a  larger  first 
ephemeral  level  would  have  much  the  same  effect. 

Finally,  the  compiler  performance  measurements  were  made  on  an  earlier 
version  of  the  system;  some  of  the  continued  development  in  the  interim  may 
have  led  to  better  performance  in  the  later  benchmarks.  Further  testing  is 
planned  to  analyze  compiler  performance  with  the  current  system. 


am  indebted  to  Jon  L.  White  for  this  terminology. 
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8  Conclusions  and  Future  Work 

8.1  Conclusions 

The  performance  analysis  presented  in  Section  7  may  be  summarized  broadly 
and  in  brief: 


•  At  tasks  in  which  large  amounts  of  data  are  allocated  and  then  dis¬ 
carded,  the  Lucid  Ephemeral  Garbage  Collector  reduces  both  the  elapsed 
real  time  and  the  central  processing  unit  time  required. 

•  At  tasks  in  which  large  amounts  of  data  are  allocated  and  retained, 
the  ephemeral  garbage  collector  will  enhance  performance  by  reducing 
the  size  of  the  working  set,  gaining  virtual  memory  performance  (and 
thus  elapsed  real  time)  at  the  expense  of  central  processing  unit  time. 

•  On  processors  with  very  good  virtual  memory  performance  (those 
configured  with  large  amounts  of  physical  memory  and  fast  paging 
devices)  the  ephemeral  garbage  collector  may  degrade  performance 
significantly.  We  believe  that  this  is  mostly  due  to  the  overhead  of 
recording  pointer  storage. 

•  Ephemeral  space  configuration  should  be  tuned  to  individual  problems 
to  avoid  extra  transporting  work. 


Additionally,  ephemeral  garbage  collections  do  not  cause  noticeable  pauses. 
This,  and  the  performance  characteristics  described  above,  promise  much  for 
the  manufacturer  of  interactive  systems  built  in  LISP  and  delivered  on  work¬ 
stations.  Ephemeral  garbage  collection  allows  smaller  workstations  without 
local  disks  to  be  used  as  symbolic  processing  delivery  vehicles  without  pro¬ 
hibitive  effects  on  performance. 


8.2  Future  Work 

We  did  not  establish  conclusively  that  our  means  for  recording  the  root  set 
was  superior  to  card-marking.  More  performance  measurements  should  be 
made  to  support  or  controvert  this  argument. 
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We  do  not  know  for  certain  that  overall  performance  is  actually  enhanced  by 
our  strategy  of  delaying  until  garbage  collection  time  the  determination  of 
the  spaces  in  which  a  pointer  is  stored  and  to  which  it  points.  Instrumenta¬ 
tion  of  the  pointer  storage  routine  and  careful  metering  of  paging  behavior 
will  yield  an  answer  to  this  question. 

A  scheme  by  which  repeated  scanning  of  the  stack  on  scavenges  of  successive 
levels  could  be  avoided  would  be  useful;  the  stack  often  grows  to  consider¬ 
able  size  during  the  execution  of  LISP  programs,  as  recursion  is  encouraged. 
The  obvious  means  by  which  to  avoid  scans  after  the  initial  one  would  be  to 
use  the  EBPLs  to  hold  backpointers  to  ephemeral  references  on  the  stack; 
however,  this  would  mean  less  efficient  use  of  EBPL  conses,  as  many  EBPL 
entries  for  the  stack  would  become  invalid  between  invocations  of  the  gar¬ 
bage  collector.  As  we  would  need  to  scan  the  stack  at  the  inception  of 
each  invocation  of  the  garbage  collector,  we  could  modify  the  initial  scan 
of  EBPLs  (which  elides  entries  for  locations  whose  MBT  entries  are  set)  to 
remove  entries  for  locations  on  the  stack. 

However,  because  most  ephemeral  garbage  collections  are  only  of  the  first 
ephemeral  level,  the  additional  bookkeeping  overhead  of  this  scheme  might 
not  pay  off.  Possibly  we  could  invoke  it  only  when  the  succeeding  level 
was  filled  nearly  to  capacity,  as  this  would  signify  increased  likelihood  of  a 
scavenge  of  that  level. 

It  would  be  useful  to  have  a  means  of  determining  the  best  configuration 
of  ephemeral  space  for  a  given  problem.  The  configuration  of  ephemeral 
spaces  is  a  compromise  between  several  conflicting  constraints.  Increas¬ 
ing  the  number  of  ephemeral  levels  causes  fewer  objects  to  be  advanced  to 
dynamic  space,  decreasing  the  number  of  dynamic  garbage  collections;  how¬ 
ever,  it  increases  the  number  of  times  a  relatively  permanent  object  must 
be  copied  before  it  is  advanced  to  dynamic  space.  Increasing  the  size  of  the 
first  level  will  give  ephemeral  objects  more  time  to  perish  before  we  ever 
transport  them;  but  it  also  reduces  locality  of  reference.  A  good  model  of 
these  constraints  would  allow  us  to  build  a  tool  that  could  analyze  the  dy¬ 
namic  behavior  of  a  program  and  make  configuration  recommendations,  or 
possibly  even  vary  dynamically  the  configuration  of  ephemeral  space. 

Some  means  of  performing  approximately  depth-first  copying  (either  that 
used  by  Moon  [10,  page  238],  or  some  other  scheme)  would  improve  locality 
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of  reference. 

Of  course,  the  availability  of  support  from  virtual  memory  systems  would 
make  a  scheme  like  the  one  described  in  Section  2.3.2  more  attractive  than 
the  one  we  implemented. 
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Part  III 

Appendices 

A  Notes 

A.l  Performance  of  incremental  garbage  collection 

Baker  [1,  page  26]  noted  that  his  incremental  garbage  collector  would  re¬ 
quire  a  larger  working  set  size  than  would  a  simple  stop-and-copy  garbage 
collector,  as  the  user  computation  running  would  require  a  certain  amount  of 
working  storage,  made  larger  by  the  necessity  of  following  forwarding  point¬ 
ers  through  evacuated  objects  in  fromspace,  and  the  scavenger  would  also 
constantly  be  cychng  through  memory  in  tospace  and  fromspace  unrelated 
to  that  currently  in  use  by  the  user  computation. 

Empirical  evidence  bears  out  Baker’s  concern.  Moon  [10,  page  236]  reports 
that  the  poor  virtual  memory  performance  of  the  Baker-style  garbage  col¬ 
lector  on  the  Symbohcs  3600  resulted  in  users’  avoidance  of  its  use  whenever 
possible. 


A. 2  Shaw’s  suggested  extension  to  virtual  memory  systems 

Shaw  [12]  suggests  a  scheme  in  which  the  virtual  memory  system  allows  the 
LISP  process  to  clear  the  dirty  bits  actually  maintained  by  the  hardware; 
prior  to  actually  clearing  the  bits,  the  virtual  memory  system  saves  their 
state  away,  so  that  two  tables  are  consulted  by  the  virtual  memory  system 
when  a  page  is  ejected  from  physical  memory. 

Such  a  facility  would  allow  the  LISP  process  to  remember  and  clear  mark  bits 
just  before  a  garbage  collection.  During  the  garbage  collection,  it  would  scan 
the  pages  remembered  to  have  been  marked;  it  would  write  them  if  they  had 
ephemeral  references,  as  these  references  would  need  to  be  updated.  Thus 
only  the  pages  with  ephemeral  references  would  be  marked  dirty  immediately 
after  a  garbage  collection;  this  would  eliminate  some  useless  page-scanning. 
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Note,  however,  that  the  facility  Shaw  suggests,  while  potentially  valuable, 
is  not  essential  to  the  success  of  Moon’s  garbage  collection  scheme  on  a 
general-purpose  computer.  Dirty  bits  alone,  maintained  in  the  usual  sense, 
will  work,  because  the  wasted  page  scans  are  of  in-core  pages;  the  expense  of 
this  operation,  even  on  general-purpose  machines,  is  dwarfed  by  any  backing 
store  operations  that  garbage  collection  may  require. 

It  is  interesting  to  note  that  Shaw’s  scheme  does  not  provide  any  means 
to  verify  whether  ejected  dirty  pages  contain  pointers  to  ephemeral  levels 
before  they  are  written  to  backing  store;  thus  unnecessary  backing  store 
operations  will  occur  at  garbage  collection  time  when  a  dirty  page  ejected 
from  physical  memory  does  not  contain  any  references  to  ephemeral  spaces. 


A. 3  Time  required  to  garbage-collect  all  levels 

Call  the  oldest  level  level  1,  and  the  youngest  level  level  m.  If  each  succeeding 
level  is  half  the  size  of  the  next  younger  level,  and  level  m  is  of  size  x,  the  nth 
level  is  of  size  2”“’” a;.  The  data  in  the  nrth  level  might  have  to  be  copied  m 
times.  Say  that  the  work  required  to  copy  x  words  is  x;  then,  if  every  level 
is  entirely  full  of  live  data,  the  work  required  for  m  levels  is 
or  mx  2"“™.  In  the  limit,  this  is  2mx,  where  x  was  the  work  I'equired 
to  copy  the  youngest  level. 


A. 4  Scanning  order  and  virtual  memory  performance 

On  the  3600,  an  ephemeral  garbage  collection  first  scavenges  the  pages  with 
GCPT  bits  set,  and  then  pages  whose  ESRT  entries  are  set.  This  probably 
results  in  many  pages  being  scanned  twice,  but,  as  Moon  comments,  it  is 
very  cheap  to  scan  a  page  with  no  ephemeral  references  on  it,  and  the  second 
scan  will  encounter  no  references  in  fromspace. 

In  our  card-marking  scheme,  we  scan  the  cards  with  set  secondary  marks 
first,  and  then  the  cards  whose  primary  marks  are  set,  but  whose  secondary 
marks  were  not  set;  thus  we  avoid  re-scanning  some  cards.  We  perform 
the  scanning  in  this  order  to  optimize  the  overall  paging  behavior  of  the 
algorithm.  If  we  wished,  we  could  first  scan  the  cards  whose  primary  marks 
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were  set;  when  we  were  finished  scanning  them,  the  data  would  have  changed 
levels  and  the  secondary  card  mark  table  would  reflect  that,  so  we  would 
also  not  scan  any  cards  twice  with  this  approach. 

Scanning  the  cards  with  set  secondary  marks  first  optimizes  the  paging 
behavior  in  this  way:  the  cards  whose  primary  card  marks  were  set  are  those 
that  were  recently  written,  and  thus  we  assume  that  they  may  be  written 
again  soon  and  should  be  in  physical  memory  after  a  garbage  collection;  if 
we  were  to  scan  them  first,  the  cards  whose  secondary  marks  were  set  would 
be  left  in  physical  memory  after  a  garbage  collection;  this  would  mean  that 
the  user  program  would  first  have  to  page  them  out. 

One  reason  why  Symbolics  might  perform  the  scan  in  the  opposite  order 
is  that,  because  their  garbage  collector  is  incremental,  the  user  program 
continues  to  execute  very  early  in  the  garbage  collection  process,  and  thus 
they  would  like  to  delay  disturbing  the  pages  in  physical  memory  as  long  as 
possible. 


A. 5  Updating  EBPLs  between  garbage  collections 

It  would  be  possible  to  update  the  EBPLs  from  the  modification  tables  (and 
clear  the  modification  tables)  between  garbage  collections;  for  example,  we 
could  cause  the  object  creation  routines  to  scan  modification  tables  whenever 
we  began  allocating  objects  in  a  new  segment.  This  would  not  violate  the 
design  goal  of  predictability,  which  states  that  programs  that  do  not  create 
objects  should  not  cause  garbage  collections,  because  the  scanning  would  be 
motivated  only  by  object  creation.  This  might  lead  to  better  virtual  memory 
performance,  as  the  garbage  collector  would  not  then  have  to  examine  such 
chronologically  distant  locations.  One  result  of  making  this  modification  to 
the  system  would  be  the  possibility  that  level  O’s  EBPL  would  have  entries 
at  the  inception  of  garbage  collection. 
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